Open nickioan opened 1 year ago
Hey there, thanks for your interest and raising the issue. Could you share more information such as the script you used so we can look into?
Thank you so much for your timely response below I have included the main code to get the reward form our recorded video. Note we divide the reward by the logit_scale
parameter to get the cosine similarity
In addition here are the main helper functions to load MineCLIP and the recorded video
We made our own configuration file for MineCLIP to avoid using Hydra and the weights were downloaded as pointed in the README file
This is the link with some of the recorded videos
And we made a configuration file for each task with task prompts generated by GPT3
Just wanted to provide the updated task configurations (the only change is that the first prompt on every task is the sentence that you all provided to GPT to generate the curriculum)
In particular, in the craft-golden-pickaxe task there are subtasks like "find a place with lots of iron" and in the video GPT-golden-pickaxe video we do that but the reward never goes above ~0.3 despite the agent finding iron to mine.
However, we ran another experiment interacting with the environment instead of human gameplay with just the "find spider" prompt and using the delta-reward with mineCLIP and it was giving results that seemed more correct! So, we're seeing if the complexity of the GPT-generated sentences were why mineCLIP was not giving good rewards.
That being said, when we switched back to direct correlation as the reward we see the same results as the human videos:
I have loaded the weights on the MineCLIP model for both the attn and avg variants to observe how the generated reward varies when parsing a video from a user playing MineCraft following one of the provided tasks. It appears that the generated reward remains stagnant throughout the video regardless of the text prompt, in addition when using randomly generated frames or zeroed frames the output is still very similar.
I am fairly certain that my video loading and weight loading process is in accordance to the existing documentation, so I am wondering if the current uploaded weights for either variants is incorrect.