facebookresearch / CompilerGym

Reinforcement learning environments for compiler and program optimization tasks
https://compilergym.ai/
MIT License
880 stars 123 forks source link

Fix datasets contains false #784

Closed mostafaelhoushi closed 1 year ago

mostafaelhoushi commented 1 year ago

I faced an issue where calling:

if "non_existent_data" in env.datasets:
   ...

would throw:

File "/private/home/melhoushi/miniconda3/envs/wmc/lib/python3.10/site-packages/compiler_gym/datasets/datasets.py", line 209, in contains self.dataset(dataset) File "/private/home/melhoushi/miniconda3/envs/wmc/lib/python3.10/site-packages/compiler_gym/datasets/datasets.py", line 137, in dataset return self.dataset_from_parsed_uri(BenchmarkUri.from_string(dataset)) File "/private/home/melhoushi/miniconda3/envs/wmc/lib/python3.10/site-packages/compiler_gym/datasets/datasets.py", line 152, in dataset_from_parsed_uri key = self._dataset_key_from_uri(uri) File "/private/home/melhoushi/miniconda3/envs/wmc/lib/python3.10/site-packages/compiler_gym/datasets/datasets.py", line 162, in _dataset_key_from_uri raise ValueError(f"Invalid benchmark URI: '{uri}'") ValueError: Invalid benchmark URI: 'benchmark:/checkpoint/melhoushi/wmc/corpus/src/'

this is because __contains__ function in Datasets is handling LookupError: https://github.com/facebookresearch/CompilerGym/blob/2b2061380bc42627396b05d5420823b3c31180e6/compiler_gym/datasets/datasets.py#LL206-L212C25 while _dataset_key_from_uri was throwing ValueError: https://github.com/facebookresearch/CompilerGym/blob/2b2061380bc42627396b05d5420823b3c31180e6/compiler_gym/datasets/datasets.py#LL159-L163C17

This PR fixes the issue by replacing ValueError in _dataset_key_from_uri and adding some unit test cases for __contains__. Not sure if this is the best way, or if you think there is a more proper solution. Perhaps _dataset_key_from_uri should throw both ValueError and LookupError ?

ChrisCummins commented 1 year ago

Nice catch and thanks for the regression test. Merging!

Cheers, Chris