Proteobench / ProteoBench

ProteoBench is an open and collaborative platform for community-curated benchmarks for proteomics data analysis pipelines. Our goal is to allow a continuous, easy, and controlled comparison of proteomics data analysis workflows. https://proteobench.cubimed.rub.de/
https://proteobench.readthedocs.io
Apache License 2.0
31 stars 8 forks source link

Error submitting proteobench zip file produced by i2MassChroQ #298

Closed OlivierLangella closed 4 weeks ago

OlivierLangella commented 4 months ago

Hello @mlocardpaulet and @enryH !

thank you very much for the i2MassChroQ support. I just tried to submit a newly produced zip archive from i2MassChroQ and I ran into troubles.

here is the error message once I press the "Parse and bench" button from https://proteobench.cubimed.rub.de/DDA%20Quant%20Ion%20Level%20-BETA-

I've chosen "i2MassChroq" as the software tool and uploaded my zip file by selecting it with "Browse files" button.

Do you have an idea of what is going wrong ?

Best wishes Olivier

❌ Proteobench ran into a problem
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xaf in position 12: invalid start byte
Traceback:

File "/mnt/data/git/ProteoBench/webinterface/pages/DDA_Quant_ion.py", line 238, in _run_proteobench
    result_performance, all_datapoints, input_df = self.ionmodule.benchmarking(
                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/micromamba/envs/proteobench/lib/python3.12/site-packages/proteobench/modules/dda_quant_ion/module.py", line 35, in benchmarking
    input_df = load_input_file(input_file, input_format)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/micromamba/envs/proteobench/lib/python3.12/site-packages/proteobench/io/parsing/parse_ion.py", line 39, in load_input_file
    input_data_frame = pd.read_csv(input_csv, low_memory=False, sep="\t")
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/micromamba/envs/proteobench/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
    return _read(filepath_or_buffer, kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/micromamba/envs/proteobench/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 620, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/micromamba/envs/proteobench/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 1620, in __init__
    self._engine = self._make_engine(f, self.engine)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/micromamba/envs/proteobench/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 1898, in _make_engine
    return mapping[engine](f, **self.options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/micromamba/envs/proteobench/lib/python3.12/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 93, in __init__
    self._reader = parsers.TextReader(src, **kwds)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "parsers.pyx", line 574, in pandas._libs.parsers.TextReader.__cinit__
File "parsers.pyx", line 663, in pandas._libs.parsers.TextReader._get_header
File "parsers.pyx", line 874, in pandas._libs.parsers.TextReader._tokenize_rows
File "parsers.pyx", line 891, in pandas._libs.parsers.TextReader._check_tokenize_status
File "parsers.pyx", line 2053, in pandas._libs.parsers.raise_parser_error
File "<frozen codecs>", line 322, in decode
mlocardpaulet commented 4 months ago

Hi! here are the instructions that should be in the documentation but are not yet (my mistake sorry):

A ProteoBench-compatible format is available in i2MassChroQ through the button “ProteoBench export”. It generates a tab-delimited file containing one row per quantified ion with all the information required for this module (column headers are: “rawfile”, “sequence”, “ProForma”, “charge”, “proteins” and “area”). Like with the other tools, the protein identifiers should be in the format “sp|P49327|FAS_HUMAN”. The ProteoBench export of i2MassChroQ also generates a single parameter file (.tsv) that is compatible with ProteoBench public upload. Link to the i2MassChroQ documentation here.

So what we did is that you should unzip the files, and first upload the quantified ions (this way you can visualise data locally without sending all the information publicly). Then, if you want to make the data public you can upload the parameter file as metadata and make you point public.

Does this make sense? Let me know if it is not clear and/or if it does not work :)

OlivierLangella commented 4 months ago

Hello ! ok thanks, as often, I didn't read the documentation ;) So trying to submit first the CSV file containing required columns, I get the error message below.

Perhaps the required code is not deployed.

Thanks for your help !

❌ Proteobench ran into a problem
FileNotFoundError: [Errno 2] No such file or directory: '/home/ubuntu/micromamba/envs/proteobench/lib/python3.12/site-packages/proteobench/io/parsing/io_parse_settings/parse_settings_i2masschroq.toml'
Traceback:

File "/mnt/data/git/ProteoBench/webinterface/pages/DDA_Quant_ion.py", line 238, in _run_proteobench
    result_performance, all_datapoints, input_df = self.ionmodule.benchmarking(
                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/micromamba/envs/proteobench/lib/python3.12/site-packages/proteobench/modules/dda_quant_ion/module.py", line 36, in benchmarking
    parse_settings = ParseSettingsBuilder().build_parser(input_format)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/micromamba/envs/proteobench/lib/python3.12/site-packages/proteobench/io/parsing/parse_settings_ion.py", line 34, in build_parser
    parse_settings = toml.load(toml_file)
                     ^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/micromamba/envs/proteobench/lib/python3.12/site-packages/toml/decoder.py", line 133, in load
    with io.open(_getpath(f), encoding='utf-8') as ffile:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
enryH commented 4 months ago

Yes, currently the build is failing and we are working on it...

RobbinBouwmeester commented 2 months ago
  1. Add a file for testing i2MassChroQ (ask @mlocardpaulet)
  2. Add code for testing parsing i2MassChroq
  3. See if above is still an issue
OlivierLangella commented 2 months ago

Thank you for response, Sorry, but there is more or less the same error message when I try once more.

FileNotFoundError: [Errno 2] No such file or directory: '/home/ubuntu/micromamba/envs/proteobench/lib/python3.12/site-packages/proteobench/io/parsing/io_parse_settings/parse_settings_i2masschroq.toml'
Traceback:

File "/mnt/data/git/ProteoBench/webinterface/pages/DDA_Quant_ion.py", line 229, in _run_proteobench
    result_performance, all_datapoints, input_df = self.ionmodule.benchmarking(
                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/micromamba/envs/proteobench/lib/python3.12/site-packages/proteobench/modules/dda_quant_ion/module.py", line 31, in benchmarking
    parse_settings = ParseSettingsBuilder().build_parser(input_format)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/micromamba/envs/proteobench/lib/python3.12/site-packages/proteobench/io/parsing/parse_settings_ion.py", line 35, in build_parser
    parse_settings = toml.load(toml_file)
                     ^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/micromamba/envs/proteobench/lib/python3.12/site-packages/toml/decoder.py", line 133, in load
    with io.open(_getpath(f), encoding='utf-8') as ffile:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The sample file is available here : https://cloud.cmb.ugent.be/index.php/s/zdGB3zZ7Fwed9gq?path=%2FModule_2_DDA_quantification%2Fsearch_results%2Fi2MassChroQ

as a zip file : proteobench_2pep_fdr01psm_fdr01prot_1_0_11.zip

it contains 2 tsv files called :

Perhaps I can get into the code, but I am not a first class python developer. I guess there must be all needed parsers because you have worked on it and this should work.

Many thanks for your help. I would be interested in giving a boost to proteobench at the next SMAP congress in Lille. It would be helpfull if i2MassChroQ DDA data could be shown.

Best wishes Olivier

mlocardpaulet commented 4 weeks ago

Hi @OlivierLangella,

I think that it is working now, right? I get this plot (green dot = your test files). newplot

Could you please try? And if the figure feels right to you, could you submit it publicly? Then you can submit as many runs as you'd like. Please let me know if you encounter any issue, I'd be happy to help out :)

mlocardpaulet commented 4 weeks ago

@OlivierLangella we noticed that in old files, the column with the quantities was named "areanorm" and not "area". Does it mean that depending on the i2MassChroQ parameters it can be one or the other? Shall we be ready to parse both versions? Could it be that we have both "area" and "areanorm" in an output file?

OlivierLangella commented 4 weeks ago

Thank you very much @mlocardpaulet , it seems OK on the plot, so it should work, but I have this issue : `❌ Proteobench ran into a problem 🚨

Error parsing the inpu file: 'utf-8' codec can't decode byte 0xaf in position 12: invalid start byte`

when I try to upload my zip file proteobench_2pep_fdr01psm_fdr01prot_1_0_11.zip

No worry about the column name "area" vs "areanorm" : there will be only "area" for the proteobench export file in i2MassChroQ.

mlocardpaulet commented 4 weeks ago

Hey! Great for the plot. For the public submission, I think that the issue is that we don't parse the zip file, but only the file Project parameters.tsv. Is it OK with you? It works on my side. We will try to make it clearer in the documentation.

OlivierLangella commented 4 weeks ago

So, when I upload the tsv file (proteobench_export.tsv) and fill in the parameter form by hand, I get this : `❌ Proteobench ran into a problem 🚨

'st.session_state has no key "12b0d448-b84e-4fce-b351-959d0df2e655". Did you forget to initialize it? More info: https://docs.streamlit.io/library/advanced-features/session-state#initialization'`

mlocardpaulet commented 4 weeks ago

this issue was fixed.