HanqingWangAI / SSM-VLN

Code and Data for our CVPR 2021 paper "Structured Scene Memory for Vision-Language Navigation"
MIT License
36 stars 7 forks source link

`train_rl` block in code seems inactive #4

Closed adaruna3 closed 3 years ago

adaruna3 commented 3 years ago

Hi all,

Thank you again for releasing your codebase.

Is the version of the code base in master the version that corresponds with the results in the paper?

I noticed the paper mentions including RL as part of the learning objective, but the code seems to have RL disabled throughout training (train_rl is always set to False in the worker threads here). Also, there are objects referenced within the train_rl condition block that do not exist in SSM class, namely self.decoder, which I am guessing instead might be self.package.decoder. I was wondering if maybe the version in master is an earlier version than what was intended.

Also, are there any specific curriculums that were followed or other tricks to train the model not mentioned in the paper? I am trying to train the model from scratch.

Thanks! Angel

HanqingWangAI commented 3 years ago

Hi @adaruna3, thanks for your attention! It seems to be a legacy block. I am looking into this issue and will fix it ASAP. As far as I remember, we first train the agent with imitation learning by teacher-forcing, and then add the augmented data and adopt both teacher/student-forcing and RL. If you meet any training problems, welcome to post them here so I can fix them.

Thanks!

adaruna3 commented 3 years ago

Great, thanks for the info. I will try what you mentioned.

adaruna3 commented 3 years ago

Also, the code to patch the train_rl block seems pretty minor. Probably just switching the decoder being called to generate last_h_. I am making a temporary patch on my end to continue training. Could you verify one assumption I have below?

I noticed there were some unused logs for A2C, e.g. policy_log_probs_front. However, those front related logs (i.e. policy_log_probs_front and entropys_front) would be discarded to integrate the train_rl block right? At first I thought they were needed, but after considering it more I thought they might be redundant because you would get similar rewards for both the frontier node and direction chosen. Do you think that makes sense? (Trying A2C without those front related terms for now).

Thanks again for your feedback!

HanqingWangAI commented 3 years ago

Exactly, a minor patch makes it work.

policy_log_probs_front stores the logits of the frontier selection, which is needed for the RL loss of the frontier selector, entropys_front used to makes the selection of a single front more confident but is discarded finally due to its negative effect in our experiment.

adaruna3 commented 3 years ago

Sorry, I do not completely follow. I am less familiar with RL so maybe that is causing the gap in my interpretation.

If I understand correctly, then neither policy_log_probs_front nor entropys_front are included in what would be the updated version of the train_rl block, right? Or would line 1043 need to be copied and policy_log_probs be switched to policy_log_probs_front to integrate that log with the RL_loss?

HanqingWangAI commented 3 years ago

Hi @adaruna3, I commit a patch for RL in the RL_patch branch. It is not tested yet since my environment and data were cleared. It is appreciated if you can try it to continue your RL training.

Thanks.

adaruna3 commented 3 years ago

Thanks! I will try it and let you know soon.

adaruna3 commented 3 years ago

Hi @HanqingWangAI, the patch is what I expected. Thanks! Closing this issue.