Error When Extracting From 2020

DaGeRe commented 1 year ago

When extracting from 2020, I get the following error:

Extracting 12 109
Traceback (most recent call last):
File "/home/reichelt/workspaces/dissworkspace/graalvm/phoenix/extract.py", line 309, in
main()
File "/home/reichelt/workspaces/dissworkspace/graalvm/phoenix/extract.py", line 305, in main
extract(args)
File "/home/reichelt/workspaces/dissworkspace/graalvm/phoenix/extract.py", line 260, in extract
save_diff (args, output_folder / f"{year_month}_metadiff.csv", meta_data)
File "/home/reichelt/workspaces/dissworkspace/graalvm/phoenix/extract.py", line 62, in save_diff
return pd.read_csv(output_file)
File "/home/reichelt/.local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 912, in read_csv
return _read(filepath_or_buffer, kwds)
File "/home/reichelt/.local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 577, in _read
parser = TextFileReader(filepath_or_buffer, kwds)
File "/home/reichelt/.local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1407, in init
self._engine = self._make_engine(f, self.engine)
File "/home/reichelt/.local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1679, in _make_engine
return mapping[engine](f, self.options)
File "/home/reichelt/.local/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 93, in init
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 555, in pandas._libs.parsers.TextReader.cinit
pandas.errors.EmptyDataError: No columns to parse from file

pandas.errors.EmptyDataError: No columns to parse from file
Extracting 12 291
Traceback (most recent call last):
File "/home/reichelt/workspaces/dissworkspace/graalvm/phoenix/extract.py", line 309, in
main()
File "/home/reichelt/workspaces/dissworkspace/graalvm/phoenix/extract.py", line 305, in main
extract(args)
File "/home/reichelt/workspaces/dissworkspace/graalvm/phoenix/extract.py", line 260, in extract
save_diff (args, output_folder / f"{year_month}_metadiff.csv", meta_data)
File "/home/reichelt/workspaces/dissworkspace/graalvm/phoenix/extract.py", line 62, in save_diff
return pd.read_csv(output_file)
File "/home/reichelt/.local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 912, in read_csv
return _read(filepath_or_buffer, kwds)
File "/home/reichelt/.local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 577, in _read
parser = TextFileReader(filepath_or_buffer, kwds)
File "/home/reichelt/.local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1407, in init
self._engine = self._make_engine(f, self.engine)
File "/home/reichelt/.local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1679, in _make_engine
return mapping[engine](f, self.options)
File "/home/reichelt/.local/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 93, in init
self._reader = parsers.TextReader(src, kwds)
File "pandas/_libs/parsers.pyx", line 555, in pandas._libs.parsers.TextReader.cinit
pandas.errors.EmptyDataError: No columns to parse from file
Extracting 12 292
Traceback (most recent call last):
File "/home/reichelt/workspaces/dissworkspace/graalvm/phoenix/extract.py", line 309, in
main()
File "/home/reichelt/workspaces/dissworkspace/graalvm/phoenix/extract.py", line 305, in main
extract(args)
File "/home/reichelt/workspaces/dissworkspace/graalvm/phoenix/extract.py", line 260, in extract
save_diff (args, output_folder / f"{year_month}_metadiff.csv", meta_data)
File "/home/reichelt/workspaces/dissworkspace/graalvm/phoenix/extract.py", line 62, in save_diff
return pd.read_csv(output_file)
File "/home/reichelt/.local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 912, in read_csv
return _read(filepath_or_buffer, kwds)
File "/home/reichelt/.local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 577, in _read
parser = TextFileReader(filepath_or_buffer, kwds)
File "/home/reichelt/.local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1407, in init
self._engine = self._make_engine(f, self.engine)
File "/home/reichelt/.local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1679, in _make_engine
return mapping[engine](f, self.options)
File "/home/reichelt/.local/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 93, in init
self._reader = parsers.TextReader(src, kwds)
File "pandas/_libs/parsers.pyx", line 555, in pandas._libs.parsers.TextReader.cinit
pandas.errors.EmptyDataError: No columns to parse from file

This only happens for December, all the other month work.

MiladAbdullah commented 1 year ago

Solved. Basically, the data for this month only consists of 80 measurements, and no diffs were found.

DaGeRe commented 1 year ago

Unfortunately, this did not fix the error, I'm receiving an error that seems to be the same:

Extracting 12 292
Traceback (most recent call last):
File "/home/reichelt/workspaces/dissworkspace/graalvm/phoenix/extract.py", line 309, in main()
File "/home/reichelt/workspaces/dissworkspace/graalvm/phoenix/extract.py", line 305, in main extract(args)
File "/home/reichelt/workspaces/dissworkspace/graalvm/phoenix/extract.py", line 260, in extract save_diff (args, output_folder / f"{year_month}_metadiff.csv", meta_data) File "/home/reichelt/workspaces/dissworkspace/graalvm/phoenix/extract.py", line 62, in save_diff return pd.read_csv(output_file) File "/home/reichelt/.local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 912, in read_csv return _read(filepath_or_buffer, kwds) File "/home/reichelt/.local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 577, in _read parser = TextFileReader(filepath_or_buffer, kwds) File "/home/reichelt/.local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1407, in init self._engine = self._make_engine(f, self.engine) File "/home/reichelt/.local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1679, in _make_engine return mapping[engine](f, self.options) File "/home/reichelt/.local/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 93, in init self._reader = parsers.TextReader(src, **kwds) File "pandas/_libs/parsers.pyx", line 555, in pandas._libs.parsers.TextReader.cinit pandas.errors.EmptyDataError: No columns to parse from file

I'm also a little surprised how https://github.com/MiladAbdullah/phoenix/commit/7e06989a31b56c660743c8b3bc1aef2a0246b3e8 can fix the bug, since fetch.py is not inside the stack trace.

MiladAbdullah commented 1 year ago

The error is thrown because the "fetch.py" saves an empty file if no record is found, and "extract.py" tries to read it. The new version saves the column names at least, which helps pandas to read the file with no records.

Can you delete the diff files (of December 2020) and re-run the code?

DaGeRe commented 1 year ago

Thanks for the hint, I've re-started the execution and wait for the results.

In order to use the same data and have them repeatable and fast extracted, I've created a script for downloading (https://github.com/MiladAbdullah/phoenix/blob/main/download.sh) and a script for extraction (https://github.com/MiladAbdullah/phoenix/blob/main/extractAll.sh), that extracts and deletes the tar files, so also regular hard disks should be sufficient for the execution.

MiladAbdullah / phoenix

Error When Extracting From 2020 #13