Holmes-Benchmark / holmes-evaluation

4 stars 1 forks source link

Missing folds.csv #1

Closed xlxwalex closed 2 months ago

xlxwalex commented 2 months ago

Hi,

Thank you very much for providing this interesting benchmark.

I've encountered two issues while using it:

  1. After downloading the data, running the command from the README results in an error stating that a file does not exist. This seems to be because encode.py attempts to read a folds.csv from the dataset folders, but I couldn't find this file in any of the folders for both versions on Dropbox. How can I resolve this issue?

  2. I noticed that there is a parameter config_file_path that points to a default path which is not included in the repository. Can I safely ignore this parameter?

Thank you for your assistance!

holmesbenchmark commented 2 months ago

Hi,

thanks for you interest and happy to assist!

It would be nice if you can give us some more details, also to improve the instructions.

  1. Could you provide the commands you run step-by-step?
  2. You refer to the parameter of encode.py, right?

Thanks for provide information!

bbunzeck commented 2 months ago

Hello, I have encountered the same issue. I cloned the repo and downloaded the data.

When running python3 investigate.py --model_name [model name here] --version flash-holmes --parallel_probing, I get the following error

Run encoding: python3 encode.py --dump_folder /Users/user_name/Downloads/holmes-evaluation-master/dumps/flash-holmes --config_file_path ../data/flash-holmes/zorro-quantifiers-superlative/config-none.yaml --model_name /Users/user_name/Documents/llamas/baby_llamas/models/final_20 --model_precision full --encoding_batch_size 10 [Errno 2] No such file or directory: '../data/flash-holmes/zorro-quantifiers-superlative/folds.csv' Traceback (most recent call last): File "/Users/user_name/Downloads/holmes-evaluation-master/src/encode.py", line 53, in <module> main() File "/Users/user_name/anaconda3/lib/python3.10/site-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs) File "/Users/user_name/anaconda3/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/Users/user_name/anaconda3/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/Users/user_name/anaconda3/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/Users/user_name/Downloads/holmes-evaluation-master/src/encode.py", line 31, in main probe_frame = load_probe_file(base_path, control_task_type) File "/Users/user_name/Downloads/holmes-evaluation-master/src/utils/data_loading.py", line 227, in load_probe_file loaded_frame = pandas.read_csv(probe_file).sort_values("id") File "/Users/user_name/anaconda3/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 948, in read_csv return _read(filepath_or_buffer, kwds) File "/Users/user_name/anaconda3/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 611, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File "/Users/user_name/anaconda3/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1448, in __init__ self._engine = self._make_engine(f, self.engine) File "/Users/user_name/anaconda3/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1705, in _make_engine self.handles = get_handle( File "/Users/user_name/anaconda3/lib/python3.10/site-packages/pandas/io/common.py", line 863, in get_handle handle = open( FileNotFoundError: [Errno 2] No such file or directory: '../data/flash-holmes/zorro-quantifiers-superlative/folds.csv' Run probing: python3 probe_parallel.py --dump_folder /Users/user_name/Downloads/holmes-evaluation-master/dumps/flash-holmes --result_folder /Users/user_name/Downloads/holmes-evaluation-master/results/flash-holmes --config_file_path ../data/flash-holmes/zorro-quantifiers-superlative/config-none.yaml --model_name /Users/user_name/Documents/llamas/baby_llamas/models/final_20 --run_probe True --run_mdl_probe False --num_hidden_layers 0 --seeds 0,1,2,3,4 2024-05-16 11:25:20,721 INFO worker.py:1749 -- Started a local Ray instance. [Errno 2] No such file or directory: '../data/flash-holmes/zorro-quantifiers-superlative/folds.csv' Traceback (most recent call last): File "/Users/user_name/Downloads/holmes-evaluation-master/src/probe_parallel.py", line 129, in <module> main() File "/Users/user_name/anaconda3/lib/python3.10/site-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs) File "/Users/user_name/anaconda3/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/Users/user_name/anaconda3/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/Users/user_name/anaconda3/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/Users/user_name/Downloads/holmes-evaluation-master/src/probe_parallel.py", line 79, in main probe_frame = load_probe_file(base_path, control_task_type) File "/Users/user_name/Downloads/holmes-evaluation-master/src/utils/data_loading.py", line 227, in load_probe_file loaded_frame = pandas.read_csv(probe_file).sort_values("id") File "/Users/user_name/anaconda3/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 948, in read_csv return _read(filepath_or_buffer, kwds) File "/Users/user_name/anaconda3/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 611, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File "/Users/user_name/anaconda3/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1448, in __init__ self._engine = self._make_engine(f, self.engine) File "/Users/user_name/anaconda3/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1705, in _make_engine self.handles = get_handle( File "/Users/user_name/anaconda3/lib/python3.10/site-packages/pandas/io/common.py", line 863, in get_handle handle = open( FileNotFoundError: [Errno 2] No such file or directory: '../data/flash-holmes/zorro-quantifiers-superlative/folds.csv' clean /tmp/ray/session_2024-05-16_11-25-18_450018_6654 clean /tmp/ray/session_2024-05-16_11-25-18_450018_6654 Run encoding: python3 encode.py --dump_folder /Users/user_name/Downloads/holmes-evaluation-master/dumps/flash-holmes --config_file_path ../data/flash-holmes/zorro-quantifiers-existential_there/config-none.yaml --model_name /Users/user_name/Documents/llamas/baby_llamas/models/final_20 --model_precision full --encoding_batch_size 10 [Errno 2] No such file or directory: '../data/flash-holmes/zorro-quantifiers-existential_there/folds.csv' Traceback (most recent call last): File "/Users/user_name/Downloads/holmes-evaluation-master/src/encode.py", line 53, in <module> main() File "/Users/user_name/anaconda3/lib/python3.10/site-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs) File "/Users/user_name/anaconda3/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/Users/user_name/anaconda3/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/Users/user_name/anaconda3/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/Users/user_name/Downloads/holmes-evaluation-master/src/encode.py", line 31, in main probe_frame = load_probe_file(base_path, control_task_type) File "/Users/user_name/Downloads/holmes-evaluation-master/src/utils/data_loading.py", line 227, in load_probe_file loaded_frame = pandas.read_csv(probe_file).sort_values("id") File "/Users/user_name/anaconda3/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 948, in read_csv return _read(filepath_or_buffer, kwds) File "/Users/user_name/anaconda3/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 611, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File "/Users/user_name/anaconda3/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1448, in __init__ self._engine = self._make_engine(f, self.engine) File "/Users/user_name/anaconda3/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1705, in _make_engine self.handles = get_handle( File "/Users/user_name/anaconda3/lib/python3.10/site-packages/pandas/io/common.py", line 863, in get_handle handle = open( FileNotFoundError: [Errno 2] No such file or directory: '../data/flash-holmes/zorro-quantifiers-existential_there/folds.csv'

I have also attached the errors as a text file because github does not seem to render the line breaks in the code properly: holmes-errors.txt

Thank you very much in advance!

xlxwalex commented 2 months ago

Thank you for your response. I have noticed that in the encode.py file, line 24:

base_path = "/".join(config_file_path.split("/")[:-1]) + "/folds.csv"

attempts to locate the folds.csv file within the data folder. However, this file does not appear to exist and also does not seem to be generated as an intermediate file.

And I have been using the command provided in the README as follows:

python3 investigate.py --model_name model_name_or_path --version flash-holmes --parallel_probing 
xlxwalex commented 2 months ago
  1. You refer to the parameter of encode.py, right?

Yes, I am referring to config_file_path at Line 15 of encode.py.

holmesbenchmark commented 2 months ago

@xlxwalex @bbunzeck Thanks for the information.

We updated the download instructions and provide a download script to make sure that the structure under the data folder matches. Further, we realized that in older version, we called the samples.csv files folds.csv. We fixed that as well.

We just tested it on a fresh environment on your system and it works for us. Would be great if you could also verify it and provide short feedback.

If you have any other question feel free to come back to us here or via Mail.

bbunzeck commented 2 months ago

@holmesbenchmark Thank you so much for the timely response!

It seems to be working for me. 😊

xlxwalex commented 2 months ago

@holmesbenchmark Thank you for the prompt response. According to the new code, it is running correctly now.