HumanCompatibleAI / imitation

Clean PyTorch implementations of imitation and reward learning algorithms
https://imitation.readthedocs.io/
MIT License
1.3k stars 247 forks source link

tests/algorithms/test_sqil.py::test_sqil_performance_continuous[DDPG] failure #791

Closed ZiyueWang25 closed 1 year ago

ZiyueWang25 commented 1 year ago

Bug description

>       assert reward_improvement.is_significant_reward_improvement(
            rewards_before,  # type:ignore[arg-type]
            rewards_after,  # type:ignore[arg-type]
        )
E       assert False
E        +  where False = <function is_significant_reward_improvement at 0x135cafdc0>([-1379.139569, -1592.006733, -1771.549952, -1345.075819, -1701.436911, -1223.385152, ...], [-1331.957287, -1518.792578, -1274.336042, -1294.565801, -1089.05092, -1652.311479, ...])
E        +    where <function is_significant_reward_improvement at 0x135cafdc0> = reward_improvement.is_significant_reward_improvement

Relevant CircleCI link: https://app.circleci.com/pipelines/github/HumanCompatibleAI/imitation/3903/workflows/84d03683-66fc-416e-865a-27bb86d78349/jobs/15818

Steps to reproduce

Directly run at master head. pytest tests/algorithms/test_sqil.py::test_sqil_performance_continuous[DDPG]

ZiyueWang25 commented 1 year ago

It was found out in https://github.com/HumanCompatibleAI/imitation/pull/779