Closed adaruna3 closed 3 years ago
Hi @adaruna3, thanks for your attention! It seems to be a legacy block. I am looking into this issue and will fix it ASAP. As far as I remember, we first train the agent with imitation learning by teacher-forcing, and then add the augmented data and adopt both teacher/student-forcing and RL. If you meet any training problems, welcome to post them here so I can fix them.
Thanks!
Great, thanks for the info. I will try what you mentioned.
Also, the code to patch the train_rl
block seems pretty minor. Probably just switching the decoder being called to generate last_h_
. I am making a temporary patch on my end to continue training. Could you verify one assumption I have below?
I noticed there were some unused logs for A2C, e.g. policy_log_probs_front. However, those front
related logs (i.e. policy_log_probs_front
and entropys_front
) would be discarded to integrate the train_rl
block right? At first I thought they were needed, but after considering it more I thought they might be redundant because you would get similar rewards for both the frontier node and direction chosen. Do you think that makes sense? (Trying A2C without those front
related terms for now).
Thanks again for your feedback!
Exactly, a minor patch makes it work.
policy_log_probs_front
stores the logits of the frontier selection, which is needed for the RL loss of the frontier selector, entropys_front
used to makes the selection of a single front more confident but is discarded finally due to its negative effect in our experiment.
Sorry, I do not completely follow. I am less familiar with RL so maybe that is causing the gap in my interpretation.
If I understand correctly, then neither policy_log_probs_front
nor entropys_front
are included in what would be the updated version of the train_rl
block, right? Or would line 1043 need to be copied and policy_log_probs
be switched to policy_log_probs_front
to integrate that log with the RL_loss?
Hi @adaruna3, I commit a patch for RL in the RL_patch
branch. It is not tested yet since my environment and data were cleared. It is appreciated if you can try it to continue your RL training.
Thanks.
Thanks! I will try it and let you know soon.
Hi @HanqingWangAI, the patch is what I expected. Thanks! Closing this issue.
Hi all,
Thank you again for releasing your codebase.
Is the version of the code base in master the version that corresponds with the results in the paper?
I noticed the paper mentions including RL as part of the learning objective, but the code seems to have RL disabled throughout training (
train_rl
is always set toFalse
in the worker threads here). Also, there are objects referenced within thetrain_rl
condition block that do not exist inSSM
class, namelyself.decoder
, which I am guessing instead might beself.package.decoder
. I was wondering if maybe the version in master is an earlier version than what was intended.Also, are there any specific curriculums that were followed or other tricks to train the model not mentioned in the paper? I am trying to train the model from scratch.
Thanks! Angel