Closed t-woodw closed 2 years ago
I still haven't been able to figure out a solution to this error. I would be very appreciative to anyone that could point me in the right direction!
@dimikout3 Hate to bother, but are you familiar with this issue? If not, can you say the best way to visually test a trained model with this repo?
Hi @t-woodw, to be honest I am not familiar with the issue above. I do not think that the comment is related to it. I will try to reproduce your error and come back with some help.
Hi @t-woodw, to be honest I am not familiar with the issue above. I do not think that the comment is related to it. I will try to reproduce your error and come back with some help.
Thanks! If it helps at all, the model I’m trying to load was trained using the runner.py
file with these args: $ python trainners/runner.py -c trainners/trainnerV2.json -r Level-1
I’ve been able to trace through the issue, and I see that the ‘initial’ is probably coming from the param.pkl
file when it is unwrapped with cloudpickle in the env_create portion of the config. Does this maybe have something to do with not registering the env in Ray (it looks like that step is skipped in rollout.py
), or a mismatch of envs between runner.py
and rollout.py
?
Also, can you say whether or not rollout.py
is the way to run a trained model with visual output?
Thanks for your help so far!
I’ve been able to trace through the issue, and I see that the ‘initial’ is probably coming from the param.pkl file when it is unwrapped with cloudpickle in the env_create portion of the config. Does this maybe have something to do with not registering the env in Ray (it looks like that step is skipped in rollout.py), or a mismatch of envs between runner.py and rollout.py?
Yes, I believe that the problem is somewhere there, unfortunately I can not invest time in fixing this issue, because I am working on a different project, but I can probably support you.
Also, can you say whether or not rollout.py is the way to run a trained model with visual output?
yes, this is how I did it.
I’ve been able to trace through the issue, and I see that the ‘initial’ is probably coming from the param.pkl file when it is unwrapped with cloudpickle in the env_create portion of the config. Does this maybe have something to do with not registering the env in Ray (it looks like that step is skipped in rollout.py), or a mismatch of envs between runner.py and rollout.py?
Yes, I believe that the problem is somewhere there, unfortunately I can not invest time in fixing this issue, because I am working on a different project, but I can probably support you.
Also, can you say whether or not rollout.py is the way to run a trained model with visual output?
yes, this is how I did it.
Do you, by any chance, still have a trained (with runner.py
) and working model (with rollout.py
) you could upload with which I could compare output? Also, if you do, could you provide the args you pass at command line when you run rollout.py
?
Thanks for your help so far!
Well, after trying a considerable amount of things over the last week, I happened upon a file in the utils
folder called fix_conf.py
that seems to be targeted at changing the conf of the param.pkl
file of trained models. I ran that, and it updated the env config such that I stopped getting the same error when trying to use rollout.py
(notably adding in the missing env_registration code from runner.py
also fixed the issue, but I opted for the fix_conf.py
solution).
It would have been really great if there was something in the readme or commented in the runner.py
or rollout.py
files about this.
Now to see if there’s a fix for numpy array to tensor in PyTorch with CUDA 11.7 (so I can use my 3080 instead of my 1080).
Hello, I've trained a PPO using the built-in methodology. I have the results with the checkpoints (for 3000 steps) in my
ray_results
folder.I'm trying to figure out how to actually test the trained model (with visual output). After looking through all the code, I think the
rollout.py
file might be how to do that (please correct me if I'm wrong).Using GeneralExplorationPolicy as the base directory, I run this at command line:
$ Python tests/rollout/rollout.py /home/theuser/ray_results/PPO_custom-explorer_2022-10-24_13-32-173jrhexyp/checkpoint_2991 --run PPO --env mars_explorer:exploConf-v01 --episodes 40 --video-dir /home/theuser/mars_ppo_vids
But I'm met with this error:
I noticed this comment (
# check why conf is not compatible will RLlib (it works on standalone gym)
) inexplorer.py
. Does this have something to do with my issue, or am I way off on something?Is there a work around for this, so that I might be able to see the trained model in action?