MineDojo / MineCLIP

Foundation Model for MineDojo
MIT License
226 stars 30 forks source link

Some details in mineagent RL implementation #4

Open YHQpkueecs opened 1 year ago

YHQpkueecs commented 1 year ago

Hello! I am reproducing your paper results (train PPO+self-imitation, with MineCLIP reward), but fail to fill some missing details:

  1. How to implement the agent's 89 discrete actions as said in paper? Currently your MineAgent uses multi-discrete output 3*3*4*25*25*3, which is much larger. Did you remove some action choices?
  2. For computing DIRECT reward using the MineCLIP model, how to sample the negative texts and how many did you sample?
  3. I find the timescale of 1 step in MineDojo simulation is much smaller than 1 second in Youtube videos. Did you use the last consecutive 16 rgb observations to compute reward?

Thank you!

YHQpkueecs commented 1 year ago

By the way, do you plan to release the training code, or the learned agent parameters?

rsha256 commented 1 year ago

@YHQpkueecs Were you able to get the learned agent parameters from @LinxiFan @wangguanzhi or @yunfanjiang