Nesvilab / FragPipe

A cross-platform Graphical User Interface (GUI) for running MSFragger and Philosopher - powered pipeline for comprehensive analysis of shotgun proteomics data
http://fragpipe.nesvilab.org
Other
184 stars 37 forks source link

PEFF Support in Outputs #1695

Closed carlosg-czbiohub closed 1 month ago

carlosg-czbiohub commented 1 month ago

Hi Fengchao, This is not really a request, but just more out of question. Are there any plans for additional support for using the PEFF format information in outputs for FragPipe? I am toying around with the format and it seems like it could be useful but for now if I want any of it I have to script the extractions downstream. Not an issue but like I said, but was just curious!

Cheers Carlos

fcyu commented 1 month ago

Do you mean search the modifications in the PEFF files? If so, we need to change MSFragger.

Thanks,

Fengchao

carlosg-czbiohub commented 1 month ago

I was thinking far less drastic and more on the order of just reporting the information in the PEFF for each protein/peptide in the combined_protein or peptide output. For example if a PEFF entry notes a previously seen modification of protein X on residue Y, that information would be passed to a column in the output as text for the protein in question. This could help spur additional re-searches for specific mods or splice variants. Hope that makes sense!

fcyu commented 1 month ago

Yes, it seems to be a good idea. If I remember it correctly, we did include the whole protein header to the MSFragger report, but since the header is very long, there were issues: https://github.com/Nesvilab/FragPipe/issues/1363#issuecomment-1854901639

Best.

Fengchao

anesvi commented 1 month ago

Thanks for the suggestion.Where do you get those database files and how frequently are they updated? At any rate, it would be a separate script then to add a column to PSM and peptide files I guess. Something that may not even be integrated directly in FragPipe?

carlosg-czbiohub commented 1 month ago

@fcyu I did see that issue, but was unsure if any development had been done to implement the PEFF content in the outputs so I just thought i'd ask. @anesvi I got the current human PEFF file from here. It seems to be updated ~2-3 times per year based on their archive. The site was referenced by HUPO's standards development site here. It seems as though Uniprotcan also generate PEFF entries using their API: "The UniProt PEFF format currently describes variation and sequence data and is available through the Proteins API service (https://www.ebi.ac.uk/proteins/api/doc/) for 31 species. An extension to include PTM data is planned." Though I have not tried to do it programmatically for a whole proteome, I have grabbed individual proteins to test the API. Seems like it might need an additional function to strip the redundant PEFF header info if multiple proteins are downloaded for a single PEFF.

anesvi commented 1 month ago

May I suggest that you contact us by email and we discuss further, as it seems to be a future work/collaborative opportunity type of request.