facebookresearch / CompilerGym

Reinforcement learning environments for compiler and program optimization tasks
https://compilergym.ai/
MIT License
898 stars 125 forks source link

Some tests timeout non-deterministically #561

Open sogartar opened 2 years ago

sogartar commented 2 years ago

🐛 Bug

On my machine often tests would hang and timeout.

To Reproduce

Steps to reproduce the behavior:

  1. bazel test --cache_test_results=no //...

Expected behavior

All tests pass without timeout.

Environment

Please fill in this checklist:

Additional context

The timeout is non-deterministic. If I run then a few times, reusing cached results, they will eventually pass. It seems that the CI machines don't not have the same problem. Attached is a list of tests that timed out when I was debugging and testing stuff. timeout.txt

ChrisCummins commented 2 years ago

Hi @sogartar, I can reproduce the issue on my end, though not as severely as for you (judging by the attached timeout list). I think it's a combination of two things: (1) flaky timeout on an overloaded system, such as when running a bunch of tests in parallel, and (2) resource leaks preventing test scripts from shutting down. You can see this by checking the test logs to see if 100% of the tests completed. If they did, then its likely related to #459.

Thanks for the good report. I'll try and find some time to look into this soon.

BTW, I haven't had any timeout issues when using make install-test, which uses pytest's runner to coordinate test execution.

Cheers, Chris