Proteobench / ProteoBench

ProteoBench is an open and collaborative platform for community-curated benchmarks for proteomics data analysis pipelines. Our goal is to allow a continuous, easy, and controlled comparison of proteomics data analysis workflows. https://proteobench.cubimed.rub.de/
https://proteobench.readthedocs.io
Apache License 2.0
39 stars 10 forks source link

Rerun of i2MassChroQ on Ion Level module fails #459

Open JuliaS92 opened 8 hours ago

JuliaS92 commented 8 hours ago

Describe the bug Downloading the input_df.csv from the Public runs and reloading that as new data raises an error.

To Reproduce Steps to reproduce the behavior:

  1. Download input_df.csv for i2MassChroQ__20240904_071654
  2. Submit the same file as i2MassChroQ software result
  3. Hit parse and bench

Expected behavior This should reproduce the results generated from the original input made to create the public run.

Screenshots

File "/mnt/data/git/ProteoBench/webinterface/pages/base_pages/quant.py", line 469, in execute_proteobench
    result_performance, all_datapoints, input_df = self.run_benchmarking_process()
                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/data/git/ProteoBench/webinterface/pages/base_pages/quant.py", line 495, in run_benchmarking_process
    return self.ionmodule.benchmarking(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/micromamba/envs/proteobench/lib/python3.12/site-packages/proteobench/modules/dda_quant_ion/dda_quant_ion_module.py", line 90, in benchmarking
    raise ParseSettingsError(f"Error parsing the input file: {e}")
ParseSettingsError: Error parsing the input file: 'ProForma'

Desktop (please complete the following information):

RobbinBouwmeester commented 8 hours ago

Hi Julia,

Not entirely sure, but is the input_df.csv direct output of i2MassChroQ? There are multiple files you should be able to retrieve via the download, not all of them are direct outputs of the tool. Some of the files are formatted intermediates by ProteoBench.

JuliaS92 commented 8 hours ago

It's either that or the params.csv or result_performance.csv. The direct input is not available, at least through the interface. If it is an intermediate format, we should make sure it loads the same way as the original input, especially for testing and rerunning of benchmarks.

RobbinBouwmeester commented 8 hours ago

Unfortunately it does not load intermediate files, and I do not think we should support that via de webinterface. We should however support downloading of the raw input files. @julianu is this currently not possible?

RobbinBouwmeester commented 8 hours ago

This relates to #458?

julianu commented 8 hours ago

All data, that is stored on the server, can be downloaded via: https://proteobench.cubimed.rub.de/datasets/ (maybe someone should put this into the docs?)

For the DDA modules, also the download function works, as far as I see. I am not entirely sure whether the "df_input.csv" is the "raw", I just link everything that is stored right now.

Edit: DIA has a bug right now... I will fix this.

JuliaS92 commented 7 hours ago

Regarding putting it in the documentation also see the other issue: https://github.com/Proteobench/ProteoBench/issues/457 For rerunning all datasets we need to be able to rerun from the input_df.csv files, if those are the only ones automatically generated.

RobbinBouwmeester commented 1 hour ago

Regarding putting it in the documentation also see the other issue: #457 For rerunning all datasets we need to be able to rerun from the input_df.csv files, if those are the only ones automatically generated.

In my opinion it would be better to run it from the raw input. So, as mentioned before there is no need to run it from the input_df.csv. Main reason is that if we change anything in the parsing we will not be able to re-use the results.