facebookresearch / CompilerGym

Reinforcement learning environments for compiler and program optimization tasks
https://compilergym.ai/
MIT License
906 stars 127 forks source link

llvm-stress-v0/173 cause backend crash #745

Closed dcy11011 closed 2 years ago

dcy11011 commented 2 years ago

🐛 Bug

CompilerGym backend crashed when llvm-v0 enviornment try to load benchmark generator://llvm-stress-v0/173

To Reproduce

import gym, compiler_gym
env = gym.make("llvm-v0")
env.reset(benchmark="generator://llvm-stress-v0/173")

The traceback is:

Traceback (most recent call last):
  File "/home/cy/.local/lib/python3.9/site-packages/compiler_gym/envs/llvm/llvm_env.py", line 341, in reset
    observation = super().reset(*args, **kwargs)
  File "/home/cy/.local/lib/python3.9/site-packages/compiler_gym/service/client_service_compiler_env.py", line 656, in reset
    return self._reset(
  File "/home/cy/.local/lib/python3.9/site-packages/compiler_gym/service/client_service_compiler_env.py", line 803, in _reset
    error, reply = _call_with_error(
  File "/home/cy/.local/lib/python3.9/site-packages/compiler_gym/service/client_service_compiler_env.py", line 724, in _call_with_error
    return None, self.service(stub_method, *args, **kwargs)
  File "/home/cy/.local/lib/python3.9/site-packages/compiler_gym/service/connection.py", line 818, in __call__
    return self.connection(
  File "/home/cy/.local/lib/python3.9/site-packages/compiler_gym/service/connection.py", line 216, in __call__
    raise ValueError(e.details()) from None
ValueError: Failed to compute .text size cost. Command returned exit code 70: /home/cy/.local/share/compiler_gym/llvm-v0/bin/clang -w -xir - -o /dev/shm/compiler_gym_cy/s/0809T165604-085543-91a8/obj-0103.o -c. Error: fatal error: error in backend: Cannot emit physreg copy instruction

(This was followed by a long period of clang's error message)

Environment

ChrisCummins commented 2 years ago

Hi @dcy11011, thanks for the report. Since llvm-stress is a tool used to generate random IRs for testing LLVM, it is "normal" behavior for it to occasionally break the compiler 🙂 The workaround is to simply catch compiler_gym.errors.BenchmarkInitError errors and skip it to try a different program:

try:
    env.reset(benchmark=my_benchmark)
except BenchmarkInitError:
    print("unusable program, skip this ...")

You can expect this error to be triggered infrequently by either of the program generators (llvm-stress-v0 and csmith-v0), I have also seen it with one or two of the Anghabench programs.

The important thing is that the error raised by CompilerGym is a BenchmarkInitError:

https://github.com/facebookresearch/CompilerGym/blob/development/compiler_gym/envs/llvm/llvm_env.py#L368-L379

In the traceback above I only see a ValueError. Is that right? If so, please re-open this issue.

Cheers, Chris

dcy11011 commented 2 years ago

Many thanks! It dose raised BenchmarkInitError. I missed it as it is surrounded by a bunch of clang errors. I noticed there're benchmarks causing errors in Anghabench, too. I'll just skip all these benchmarks as you suggested :)