issue when parsing evidence file from MQ 1_3_0_5

Proteobench / ProteoBench

ProteoBench is an open and collaborative platform for community-curated benchmarks for proteomics data analysis pipelines. Our goal is to allow a continuous, easy, and controlled comparison of proteomics data analysis workflows.

https://proteobench.readthedocs.io

Apache License 2.0

27 stars 7 forks source link

issue when parsing evidence file from MQ 1_3_0_5 #213

Closed mlocardpaulet closed 5 months ago

mlocardpaulet commented 5 months ago

I have the file and will have a look. Can share.

mlocardpaulet commented 5 months ago

I think that it comes from the fact that the most recent versions of MQ (compatible with ProteoBench) report the Proteins as sp|O75822|EIF3J_HUMAN whereas in the version that causes trouble (1.3.0.5), it is only the Uniprot accession P63313. I wonder if it can be a more general issue: does reporting depend on the some parameters that define the fasta header parsing?

mlocardpaulet commented 5 months ago

So.. After discussing this with Holda, we realised that v1.5 and earlier: there is no way to change the fasta header parsing. For the moment we will just indicate in the documentation that a specific fasta header parsing is necessary for outputs to be compatible with ProteoBench. For more recent versions, the default parameters follow the rules:

Identifier rule = >([^\s]*)
Description rule = >(.*)

people should use these settings for their output to be compatible with ProteoBench. I will indicate this in the docs.