Open hlsafin opened 2 weeks ago
Hi, that would certainly be an interesting extension! One path would be to use GPT-4o’s vision capabilities and operate IGE on visual observations. Another route could be to summarize visual observations into text with GPT-4. Both options seem feasible to me :)
In your opinion, can this approach be applied to Montezuma's revenge enviroment?