bigbio / quantms.io

The proteomics quantification format, extending mzTab for large scale datasets.
Other
5 stars 2 forks source link

Integrity check needed. #37

Open ypriverol opened 4 months ago

ypriverol commented 4 months ago

In a lot of cases when converting really large datasets I have observed that the resulting files in parquet may be corrupted; resulting in errors when we need to read them. I suggest to integrate into each command for conversion an integrity check in the file that we produce. Example error:

Traceback (most recent call last):
  File "/hps/software/users/juan/pride/anaconda3/envs/quantmsio/bin/quantmsio_cli", line 33, in <module>
    sys.exit(load_entry_point('quantmsio==0.0.3', 'console_scripts', 'quantmsio_cli')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/hps/software/users/juan/pride/anaconda3/envs/quantmsio/lib/python3.11/site-packages/quantmsio-0.0.3-py3.11.egg/quantms_io/quantmsio_cli.py", line 70, in quantms_io_main
  File "/hps/software/users/juan/pride/anaconda3/envs/quantmsio/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/hps/software/users/juan/pride/anaconda3/envs/quantmsio/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/hps/software/users/juan/pride/anaconda3/envs/quantmsio/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/hps/software/users/juan/pride/anaconda3/envs/quantmsio/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/hps/software/users/juan/pride/anaconda3/envs/quantmsio/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/hps/software/users/juan/pride/anaconda3/envs/quantmsio/lib/python3.11/site-packages/quantmsio-0.0.3-py3.11.egg/quantms_io/commands/get_unanimous_command.py", line 30, in get_unanimous_for_parquet
  File "/hps/software/users/juan/pride/anaconda3/envs/quantmsio/lib/python3.11/site-packages/quantmsio-0.0.3-py3.11.egg/quantms_io/core/tools.py", line 232, in map_protein_for_parquet
  File "/hps/software/users/juan/pride/anaconda3/envs/quantmsio/lib/python3.11/site-packages/quantmsio-0.0.3-py3.11.egg/quantms_io/core/tools.py", line 278, in change_and_save_parquet
  File "/hps/software/users/juan/pride/anaconda3/envs/quantmsio/lib/python3.11/site-packages/quantmsio-0.0.3-py3.11.egg/quantms_io/core/tools.py", line 311, in read_large_parquet
  File "pyarrow/_parquet.pyx", line 1323, in iter_batches
  File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status
OSError: Tried reading 1025877 bytes starting at position 31565171 from file but only got 0