Open jarlva opened 1 year ago
You're encountering a general machine learning problem called "overfitting". It is generally a challenge to make sure a model generalizes beyond training distribution, and it is not specific to RL or Sample Factory.
Some things to look at:
Thanks again for your reply @alex-petrenko !
I tried the following but none worked. I'd like to try dropout and noticed it's possible to apply in pytorch but not sure how to do it in the SF2 code (maybe add an optional parameter)?
update: also tried editing sample-factory/tests/test_precheck.py with lines 15, 18
. adding noise to observations, up to +/-5% . PBT . simplify the model to 256,256 . changed LR to 0.00001 and 0.001, from default 0.0001 . increased data from 30k to 100k rows . it's not possible to augment data
Hi @alex-petrenko , would it be possible to reply to the latest request from 2 days ago, above?
I think your best option is to implement a custom model (encoder only should be sufficient, but you can override the entire actor-critic module). See the documentation here: https://www.samplefactory.dev/03-customization/custom-models/
Just add dropout as a layer and fingers crossed it should work. You should be careful about eval()
and train()
modes for your PyTorch module but I think you should already be covered here.
See here for example: https://discuss.pytorch.org/t/if-my-model-has-dropout-do-i-have-to-alternate-between-model-eval-and-model-train-during-training/83007/2
Hmmm I guess your confusion might be from the fact that Dropout can't be just added as a model layer, you have to actually call it explicitly in forward()
If I were you I would simply modify the code of forward() method of the actor_critic class to call dropout when needed.
Sorry, I don't think I can properly help you with the problem without knowing context and details of your problem. Overfitting is one of the hardest problems in all of ML and there's no single magical recipe for fixing it.
Hi @alex-petrenko , sorry, I'm not an expert at this. I'm using a customized cartpole-like gym env. Do you mean edit sample_factory/model/actor_critic.py in the following, lines 154, 184?
1/30 Update: Also updated sample_factory/model/encoder.py lines 216, 221
Also, would it make sense to add dropout as a switch option?
First thing I would try would be to add dropout after each layer in the encoder. If you're using a cartpole-like environment, then you would need to modify MLP Encoder which is defined here: https://github.com/alex-petrenko/sample-factory/blob/86332022b489f9253cbaf8f71f8d49b47d765036/sample_factory/model/encoder.py#L72
Convolutional encoder probably has nothing to do with your task if your observations are just vectors of numbers. Convolutional encoder is for the images.
I added it in the model_utils.py file, line 52. So the layers are:
RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Dropout) (3): RecursiveScriptModule(original_name=Linear) (4): RecursiveScriptModule(original_name=ELU) (5): RecursiveScriptModule(original_name=Dropout) )
But, alas, that's still not solving overfitting...
Dropout is one way to combat overfitting but it is not a panacea.
I'm sorry I can't help figure out your exact issue, as I said previously, overfitting is a general machine learning phenomenon and most likely your problem has nothing to do with Sample Factory, but rather with the overall problem formulation and approach.
Hi @alex-petrenko , I understand. I appreciate the guidance and advice! Please let me know if you'd be open to advise for pay?
@jarlva not sure if this is realistic right now. I'm starting a full-time job very soon which will keep me busy for a foreseeable future.
You said you're able to fit to your training data, right? That means, trained policy does well on the training data when you're evaluating? But completely fails on out-of-distribution data.
If I could get some ideas what's your environment and what exactly the difference between your training and test data is, I could be more helpful. Maybe we can set up a call in ~2 weeks. Feel free to reach out on Discord DM or by email to discuss further.
Hey, after training (~200M) showing good reward, Enjoy shows bad reward numbers on unseen data. When including the training data in Enjoy the reward matches training. So, it seems the model "remembers" the data, as opposed to learning.
What's the best way to deal with that (other than adding more data and introducing random noise)? Are there settings to try?
Training a gym-like env with the following: