Proteobench / ProteoBench

ProteoBench is an open and collaborative platform for community-curated benchmarks for proteomics data analysis pipelines. Our goal is to allow a continuous, easy, and controlled comparison of proteomics data analysis workflows.
https://proteobench.readthedocs.io
Apache License 2.0
27 stars 7 forks source link

Update parse_settings_maxquant.toml #299

Closed RobbinBouwmeester closed 1 month ago

mlocardpaulet commented 1 month ago

Hey! I am sorry, I have an error on my side (when I run the PR locally):

image

It does not look like it comes from your changes, does it?

mlocardpaulet commented 1 month ago

Hey! I am sorry, I have an error on my side (when I run the PR locally): image It does not look like it comes from your changes, does it?

Although I don't have the error when I run the main locally.

mlocardpaulet commented 1 month ago

Also, can your script handle other modifications than oxidation and acetylation? Because users can search with many others without us even knowing, right?

RobbinBouwmeester commented 1 month ago

There is no naming convention, so no essentially we will not normalize modifications not in the dictionary provided. We keep them as is. There is no solution I can think of for this.

mlocardpaulet commented 1 month ago

There is no naming convention, so no essentially we will not normalize modifications not in the dictionary provided. We keep them as is. There is no solution I can think of for this.

I think that this is completely fine. I just asked because in the toml, the field "modification_dict" only contains oxydation and acetylation.

RobbinBouwmeester commented 1 month ago

@mlocardpaulet latest push should fix your issues :)

RobbinBouwmeester commented 1 month ago

Ok, nope, did not fix it, will try to fix it ASAP :)

RobbinBouwmeester commented 1 month ago

@mlocardpaulet now it should work! 👍

mlocardpaulet commented 1 month ago

It does not, sorry. Here is what I have:

image

So I suspect that "[Oxidation (M)]" -> "[(M)]", and I don't see the acetylations.

mlocardpaulet commented 1 month ago

Remind me: why do we even change the value in the "Modified sequence" field if we keep the modifications as is? I know that this was necessary to do so with other outputs, specially when sequences are stripped (no modification) and modifications are listed in another field. But here, why do we do that? And I am still not sure I understand the "modification_dict". What happens if people search with phosphorylation? I am sorry I am a bit slow to understand...

RobbinBouwmeester commented 1 month ago

Ok, we decided to do this to make downstream analysis easier if we want to compare (different) search engine input.

If the modification does not exist it will use the same name that is reported, but it will be put in between braces. So, for example:

AAPAPEEMS(Phospho (S))EPK|Z=3

If the modification is not present it will be changed to:

AAPAPEEMS[Phospho (S)]EPK|Z=3

mlocardpaulet commented 1 month ago

Ok, we decided to do this to make downstream analysis easier if we want to compare (different) search engine input.

If the modification does not exist it will use the same name that is reported, but it will be put in between braces. So, for example:

AAPAPEEMS(Phospho (S))EPK|Z=3

If the modification is not present it will be changed to:

AAPAPEEMS[Phospho (S)]EPK|Z=3

OK, I think that what is the most important is to have the modifications in the correct position in the sentence. The way the modification is encoded won't be homogenised by us, indeed. We also want all the modifications in the intermediate file (the ones we have in "modification_dict" and the others) to be reported in a homogenous fashion, right? If not, it will make things a lot more difficult for people who would want to parse these intermediate files (and actually I plan to do so for the paper). What do you think?

mlocardpaulet commented 1 month ago

Hello! @RobbinBouwmeester you know that it still does not work well, right? Here is the output from the main, a file from MQ 2.5.1.0 image

The [(M)] is not the correct modification.