facebookresearch / CompilerGym

Reinforcement learning environments for compiler and program optimization tasks
https://compilergym.ai/
MIT License
880 stars 123 forks source link

Where to find source code of all the datasets? #815

Closed dcy11011 closed 10 months ago

dcy11011 commented 10 months ago

❓ Questions and Help

I'm doing an experiment using features extracted from the origin C/C++ source code of a benchmark to improve the optimization ability of some algorithms, and I want to verify the effectiveness of this method on datasets in CompilerGym. But I found that most of the datasets in CompilerGym have no source file (compiler_gym.datasets.Benchmark.sources attribute of those benchmarks is an empty list). I tried to obtain the source code of each dataset from the homepage listed in CompilerGym's documentation but found many of them were no longer accessible or were only homepages of projects where datasets were extracted from(TensorFlow, OpenCV). Do you have any suggestions on where to find the source code of the benchmarks in these datasets?

Below is a list of datasets in CompilerGym that I can't find the origin source code:

ChrisCummins commented 10 months ago

Hey @dcy11011, unfortunately we no longer have the C/C++ sources for most of them :( After lowering to LLVM bitcode I got rid of the sources (in retrospect this was a mistake!).

The only one from that list I know there are features for is cBench. Here are the rules to lower the C sources to LLVM:

https://github.com/facebookresearch/CompilerGym/blob/development/compiler_gym/third_party/cbench/BUILD#L15-L43

Cheers, Chris

dcy11011 commented 10 months ago

Thanks a lot! It's sad to hear that most of the source code was lost, but the source code of cBench can help me a lot already. :)

ChrisCummins commented 10 months ago

Sorry I can't help more. The only thing I could suggest is that since all of the above datasets are based on publicly available C/C++ repos, you could construct your own versions by running their build system and hacking it to dump out bitcode files as a sideffect.

Cheers, Chris