Open DaGeRe opened 1 year ago
Solved. Basically, the data for this month only consists of 80 measurements, and no diffs were found.
Unfortunately, this did not fix the error, I'm receiving an error that seems to be the same:
Extracting 12 292
Traceback (most recent call last):
File "/home/reichelt/workspaces/dissworkspace/graalvm/phoenix/extract.py", line 309, inmain()
File "/home/reichelt/workspaces/dissworkspace/graalvm/phoenix/extract.py", line 305, in main extract(args)
File "/home/reichelt/workspaces/dissworkspace/graalvm/phoenix/extract.py", line 260, in extract save_diff (args, output_folder / f"{year_month}_metadiff.csv", meta_data) File "/home/reichelt/workspaces/dissworkspace/graalvm/phoenix/extract.py", line 62, in save_diff return pd.read_csv(output_file) File "/home/reichelt/.local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 912, in read_csv return _read(filepath_or_buffer, kwds) File "/home/reichelt/.local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 577, in _read parser = TextFileReader(filepath_or_buffer, kwds) File "/home/reichelt/.local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1407, in init self._engine = self._make_engine(f, self.engine) File "/home/reichelt/.local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1679, in _make_engine return mapping[engine](f, self.options) File "/home/reichelt/.local/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 93, in init self._reader = parsers.TextReader(src, **kwds) File "pandas/_libs/parsers.pyx", line 555, in pandas._libs.parsers.TextReader.cinit pandas.errors.EmptyDataError: No columns to parse from file
I'm also a little surprised how https://github.com/MiladAbdullah/phoenix/commit/7e06989a31b56c660743c8b3bc1aef2a0246b3e8 can fix the bug, since fetch.py
is not inside the stack trace.
The error is thrown because the "fetch.py" saves an empty file if no record is found, and "extract.py" tries to read it. The new version saves the column names at least, which helps pandas
to read the file with no records.
Can you delete the diff files (of December 2020) and re-run the code?
Thanks for the hint, I've re-started the execution and wait for the results.
In order to use the same data and have them repeatable and fast extracted, I've created a script for downloading (https://github.com/MiladAbdullah/phoenix/blob/main/download.sh) and a script for extraction (https://github.com/MiladAbdullah/phoenix/blob/main/extractAll.sh), that extracts and deletes the tar files, so also regular hard disks should be sufficient for the execution.
When extracting from 2020, I get the following error:
This only happens for December, all the other month work.