Some details in mineagent RL implementation

YHQpkueecs commented 1 year ago

Hello! I am reproducing your paper results (train PPO+self-imitation, with MineCLIP reward), but fail to fill some missing details:

How to implement the agent's 89 discrete actions as said in paper? Currently your MineAgent uses multi-discrete output 3*3*4*25*25*3, which is much larger. Did you remove some action choices？
For computing DIRECT reward using the MineCLIP model, how to sample the negative texts and how many did you sample?
I find the timescale of 1 step in MineDojo simulation is much smaller than 1 second in Youtube videos. Did you use the last consecutive 16 rgb observations to compute reward?

Thank you!

YHQpkueecs commented 1 year ago

By the way, do you plan to release the training code, or the learned agent parameters?

rsha256 commented 1 year ago

@YHQpkueecs Were you able to get the learned agent parameters from @LinxiFan @wangguanzhi or @yunfanjiang

MineDojo / MineCLIP