glmcdona / LuxPythonEnvGym

Matching python environment code for Lux AI 2021 Kaggle competition, and a gym interface for RL models.
MIT License
73 stars 38 forks source link

Update example training agent and script #86

Closed glmcdona closed 2 years ago

glmcdona commented 2 years ago

Updates:

  1. Split unit and city actions.
  2. Update reward function to a better example that is a delta of reward and scaled reasonably. Fixes issue #83.
  3. Set default example agent to inference on non-deterministic mode. Agents often get stuck when set to deterministic in inference.
  4. Fix bug in unit maps where it wouldn't track nearest unit correctly.
  5. Add multi-environment training command-line arg.
  6. Add multi-environment evaluation metrics logging.
  7. Add single-environment tensorboard game internal metrics logging.
royerk commented 2 years ago

Please let me know if you would like me (us?) to also run this update. Looks great, looking forward to have feedback on multi-envs training :heart:

nosound2 commented 2 years ago

Hi @glmcdona , just a small remark from me. Many of the changes in this pull request seem to me already too deep implementation details, the kind of things everyone should decide for himself. I would have kept the repo cleaner than that. However, I am not familiar with the repository philosophy, just a first thought.

[UPD] the new tensorboard logging stats are cool

glmcdona commented 2 years ago

Hi @glmcdona , just a small remark from me. Many of the changes in this pull request seem to me already too deep implementation details, the kind of things everyone should decide for himself. I would have kept the repo cleaner than that. However, I am not familiar with the repository philosophy, just a first thought.

[UPD] the new tensorboard logging stats are cool

Totally agree. Clean and simple for the example I think makes sense. Leave more advanced implementation like this for something like a separate public notebook using the framework. TODO: Refactor the example agent and training script to more minimal.