aws-deepracer-community / deepracer-for-cloud

Creates an AWS DeepRacing training environment which can be deployed in the cloud, or locally on Ubuntu Linux, Windows or Mac.
MIT No Attribution
331 stars 179 forks source link

Locally trained model not valid for import on DeepRacer console after the July 2020 update #45

Closed everdark closed 4 years ago

everdark commented 4 years ago

Hi,

I tried to setup the environment and successfully train several models. But when I tried import them to AWS DeepRacer console, I got Invalid model error status, and the description being that We can't validate your model because it's been edited.

I realized during middle of July 2020 there is a major update on DeepRacer console. Now the model artifacts will no longer be created on S3, but being hidden somewhere that we won't have access to, except for the logs. I try to follow the official document about the update:

https://docs.aws.amazon.com/deepracer/latest/developerguide/deepracer-troubleshooting-service-migration-errors.html#what-is-update

to create necessary files from my local training environment and upload to S3 manually. But I just cannot get it imported to DeepRacer console. To be specific, I upload the following artifacts:

└── super
    ├── ip
    │   ├── done
    │   ├── hyperparameters.json
    │   └── ip.json
    ├── model
    │   ├── .coach_checkpoint
    │   ├── 16_Step-32917.ckpt.data-00000-of-00001
    │   ├── 16_Step-32917.ckpt.index
    │   ├── 16_Step-32917.ckpt.meta
    │   ├── 17_Step-37295.ckpt.data-00000-of-00001
    │   ├── 17_Step-37295.ckpt.index
    │   ├── 17_Step-37295.ckpt.meta
    │   ├── model_16.pb
    │   ├── model_17.pb
    │   └── model_metadata.json
    ├── model_metadata.json
    └── reward_function.py

But its not working. I have also tried several combinations, for example delete the model_metada.json under root, keep only hyperparameres.json under ip, removing *.pb files...

Nothing works. :(

Anyone has the same issue?

everdark commented 4 years ago

I think I know the reason. The underlying model is version 2019 but now on cloud is 2020. 2019 uses a older version of rl_coach (0.11) and also older version of the robomaker. I will use https://github.com/mattcamp/deepracer-local instead for 2020. Thanks for the great works anyway! I've learned a lot.