alex-petrenko / sample-factory

High throughput synchronous and asynchronous reinforcement learning
https://samplefactory.dev
MIT License
811 stars 109 forks source link

Fixed bug when using mixed action spaces #250

Closed edbeeching closed 1 year ago

edbeeching commented 1 year ago

When using tuple action distributions in non-batched mode, there is a sneaky bug that appears. This is due to np.split and torch.split not having the same functionality. See: https://github.com/pytorch/pytorch/issues/50012

Fixed by using the cumsum of the action splits when in non batched mode,

edbeeching commented 1 year ago

I bundled a fix for #251 in to this PR as well

codecov-commenter commented 1 year ago

Codecov Report

Base: 79.69% // Head: 79.70% // Increases project coverage by +0.00% :tada:

Coverage data is based on head (b50ec0d) compared to base (8d4916d). Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #250 +/- ## ======================================= Coverage 79.69% 79.70% ======================================= Files 95 95 Lines 7458 7461 +3 ======================================= + Hits 5944 5947 +3 Misses 1514 1514 ``` | [Impacted Files](https://codecov.io/gh/alex-petrenko/sample-factory/pull/250?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Aleksei+Petrenko) | Coverage Δ | | |---|---|---| | [sample\_factory/algo/learning/learner.py](https://codecov.io/gh/alex-petrenko/sample-factory/pull/250/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Aleksei+Petrenko#diff-c2FtcGxlX2ZhY3RvcnkvYWxnby9sZWFybmluZy9sZWFybmVyLnB5) | `88.13% <100.00%> (+0.15%)` | :arrow_up: | | [...mple\_factory/algo/sampling/non\_batched\_sampling.py](https://codecov.io/gh/alex-petrenko/sample-factory/pull/250/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Aleksei+Petrenko#diff-c2FtcGxlX2ZhY3RvcnkvYWxnby9zYW1wbGluZy9ub25fYmF0Y2hlZF9zYW1wbGluZy5weQ==) | `98.47% <100.00%> (ø)` | | | [sample\_factory/pbt/population\_based\_training.py](https://codecov.io/gh/alex-petrenko/sample-factory/pull/250/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Aleksei+Petrenko#diff-c2FtcGxlX2ZhY3RvcnkvcGJ0L3BvcHVsYXRpb25fYmFzZWRfdHJhaW5pbmcucHk=) | `84.58% <0.00%> (-0.42%)` | :arrow_down: | | [...f\_examples/envpool/mujoco/envpool\_mujoco\_params.py](https://codecov.io/gh/alex-petrenko/sample-factory/pull/250/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Aleksei+Petrenko#diff-c2ZfZXhhbXBsZXMvZW52cG9vbC9tdWpvY28vZW52cG9vbF9tdWpvY29fcGFyYW1zLnB5) | `100.00% <0.00%> (ø)` | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Aleksei+Petrenko). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Aleksei+Petrenko)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.