Closed hokhay closed 2 years ago
I think this is the key part of the error message:
ValueError: unsupported pickle protocol: 5
What's happened is the model.zip was trained on a python environment with I believe a different cloudpickle version. Which version of python are you using for training?
This might help: https://stackoverflow.com/questions/63329657/python-3-7-error-unsupported-pickle-protocol-5
Sorry I am not sure what cloudpickle is but I am wondering if the problem is caused by me using Pycharm virtual environment for training instead of system Python
Cloudpickle is the library that stable_baselines3 uses to serialize the resulting model file down into the model.zip. Likely changing your Pycharm virtual environment to one using Python 3.7.* will fix it. You may be able to look at the stackoverflow link above for ways to convert your model.zip format as an alternative.
I had this exact issue and it was coming from using a version of python higher than the 3.7
that kaggle env is using. The python version is critical as to which version of pickle is used and there are some incompatibilities.
PyCharm isn't the issue, you should be able to change the interpreter from your current python version to python3.7
(you may have to install python 3.7 on your machine). If you have multiple versions of python on your machine then PyCharm can make a virtual environment for you based on a specific version of python.
Thank you guys for help me out. I am trying to re-run the training with Python 3.7 then.
Hey guys,
After changing to Python 3.7, I think the Python version issue is gone but I got another two error message. This is more like a the program issue
[[{"duration": 8.578081, "stdout": "", "stderr": "Traceback (most recent call last):\n File \"./main_lux-ai-2021.py\", line 23, in <module>\n
model = PPO.load(f\"model.zip\")\n
File \"/kaggle_simulations/agent/stable_baselines3/common/base_class.py\", line 688, in load\n
model._setup_model()\n File \"/kaggle_simulations/agent/stable_baselines3/ppo/ppo.py\", line 155, in _setup_model\n super(PPO, self)._setup_model()\n
File \"/kaggle_simulations/agent/stable_baselines3/common/on_policy_algorithm.py\", line 118, in _setup_model\n n_envs=self.n_envs,\n
File \"/kaggle_simulations/agent/stable_baselines3/common/buffers.py\", line 328, in __init__\n
super(RolloutBuffer, self).__init__(buffer_size, observation_space, action_space, device, n_envs=n_envs)\n File \"/kaggle_simulations/agent/stable_baselines3/common/buffers.py\", line 49, in __init__\n
self.obs_shape = get_obs_shape(observation_space)\n
File \"/kaggle_simulations/agent/stable_baselines3/common/preprocessing.py\", line 144, in get_obs_shape\n
return observation_space.shape\nAttributeError: 'Box' object has"}],
[{"duration": 0.004079, "stdout": "", "stderr": "Traceback (most recent call last):\n
File \"/opt/conda/lib/python3.7/site-packages/kaggle_environments/agent.py\", line 157, in act\n action = self.agent(*args)\n File \"/opt/conda/lib/python3.7/site-packages/kaggle_environments/agent.py\", line 129, in callable_agent\n
if callable(agent) \\\n
File \"/kaggle_simulations/agent/main.py\", line 76, in python_policy_agent\n
agent_process.stdin.flush()\nBrokenPipeError: [Errno 32] Broken pipe\n"}]]
Jason
Hrm, I think I may have encountered this error before and I think I remember what it is. Are you using a dictionary observation space? If I recall, in training it worked fine if you had extra dictionary keys in the observation in training, but had an error like this in inference. The solution was to remove any dictionary key's in the observation that aren't used.
Also, try to get it working locally first, eg does this work? https://github.com/glmcdona/LuxPythonEnvGym#creating-and-viewing-a-replay
lux-ai-2021 ./kaggle_submissions/main_lux-ai-2021.py ./kaggle_submissions/main_lux-ai-2021.py --maxtime 100000
I was using 2 weeks ago version codes from this github and there was no error when I run the locally. Then now I download the most update version of the codes, I have replicated the same error Kaggle, so is there any change to the codes that could produce this error?
I am now running into this error even with the original codes from here. Could you give me clue of what I need to modify in the observation?
Thank you a lot Jason
Two things come to mind:
python --version
python setup.py install
I have been using the repo without any issue for submissions, my guess is that something in your environment is not set as kaggle expects.
One thing I find out that is when I run locally lux-ai-2021 kaggle_submissions/main_lux-ai-2021.py kaggle_submissions/main_lux-ai-2021.py --maxtime 100000
, I can get the AttributeError: 'Box' object has no attribute 'shape'
as on Kaggle. I suppose this is the cause of my default Python
version of 3.8.
However when I run lux-ai-2021 --python python3.7 kaggle_submissions/main_lux-ai-2021.py kaggle_submissions/main_lux-ai-2021.py --maxtime 100000
, the simulation can run successfully. This show that my model was trained under the Python 3.7 env, so it can only run in 3.7 env.
Therefore I am confused that why I get the same error on Kaggle as the one I get locally when using Python 3.8.
Thanks a lot Jason
I have figured it out. It turns out that my Gym version is 0.2 while the codes required version <0.2. The issue is solved when I reinstall Gym 0.19.
This issue maybe avoid if the Gym version can be specified in setup.py
Thanks a lot for the help from you guys
Hi,
I have encountered error after kaggle submission. The following is error log from the game play in Kaggle. The game only plays for 1 turn and then stop. I used Python 3.7 to train the model
Have any encountered this error as well?
Thanks Jason