PKU-RL / CORRO

CORRO code
34 stars 6 forks source link

Sudden drop in return #1

Open MoreanP opened 2 years ago

MoreanP commented 2 years ago

image Hello. When I reproduce the paper, I use the point-robot-v1 environment and find that the return converges and then drops suddenly。 But I didn't modify the code。 What reason is this?

Folly135 commented 1 year ago

Hi! I would like to know if you have solved this problem and does this image of yours corresponds to the RETURN image of the TRAIN stage? I don't seem to be able to draw the part corresponding to Figure 2 according to the Readme provided by the author, and the subsequent OOD should correspond to the contents of Table 1.

MoreanP commented 1 year ago

Hi! I would like to know if you have solved this problem and does this image of yours corresponds to the RETURN image of the TRAIN stage? I don't seem to be able to draw the part corresponding to Figure 2 according to the Readme provided by the author, and the subsequent OOD should correspond to the contents of Table 1.

Yes, this graph corresponds to Figure 2 in the paper, which is a test return using iid data during training and was drawn using Tensorboard. The author did not provide the code for drawing the original image of the paper. I didn't solve the problem of sudden performance degradation in this image later.

Folly135 commented 1 year ago

Hi! I would like to know if you have solved this problem and does this image of yours corresponds to the RETURN image of the TRAIN stage? I don't seem to be able to draw the part corresponding to Figure 2 according to the Readme provided by the author, and the subsequent OOD should correspond to the contents of Table 1.

Yes, this graph corresponds to Figure 2 in the paper, which is a test return using iid data during training and was drawn using Tensorboard. The author did not provide the code for drawing the original image of the paper. I didn't solve the problem of sudden performance degradation in this image later.

Thank you very much for your reply, this part corresponds to the Offline Meta-RL section inside the Readme right?

MoreanP commented 1 year ago

Hi! I would like to know if you have solved this problem and does this image of yours corresponds to the RETURN image of the TRAIN stage? I don't seem to be able to draw the part corresponding to Figure 2 according to the Readme provided by the author, and the subsequent OOD should correspond to the contents of Table 1.

Yes, this graph corresponds to Figure 2 in the paper, which is a test return using iid data during training and was drawn using Tensorboard. The author did not provide the code for drawing the original image of the paper. I didn't solve the problem of sudden performance degradation in this image later.

Thank you very much for your reply, this part corresponds to the Offline Meta-RL section inside the Readme right?

yes

nanzhu2003 commented 7 months ago

Hello, I'm trying running the code now but got some problems. Can I ask you which model did you load when run the test_ood_context.py?When I run this code, it always showed that the input size is incompatible with the parameter that I trained through the train_offpolicy_with_trained_encoder. Thank you a lot!

MoreanP commented 7 months ago

Hello, I'm trying running the code now but got some problems. Can I ask you which model did you load when run the test_ood_context.py?When I run this code, it always showed that the input size is incompatible with the parameter that I trained through the train_offpolicy_with_trained_encoder. Thank you a lot!

It's a long time after did this, I don't remember very well. Which model reported an error?

nanzhu2003 commented 7 months ago

Hello, I'm trying running the code now but got some problems. Can I ask you which model did you load when run the test_ood_context.py?When I run this code, it always showed that the input size is incompatible with the parameter that I trained through the train_offpolicy_with_trained_encoder. Thank you a lot!

It's a long time after did this, I don't remember very well. Which model reported an error? Not exactly a problem. I finally found that it's because that I train the model with datasets I collected by myself so I don't have the behavior policy model. Now I have trained it. Anyhow, thanks a lot for your reply!