Vitek-Lab / MSstatsPTM

Post Translational Modification (PTM) Significance Analysis in shotgun mass spectrometry-based proteomic experiments
https://vitek-lab.github.io/MSstatsPTM/
Artistic License 2.0
8 stars 2 forks source link

Protein Prospector Converter #39

Open jcmaynard opened 1 year ago

jcmaynard commented 1 year ago

Hi,

Would it be possible to add a converter for Protein Prospector output?

Cheers!

devonjkohler commented 11 months ago

Hi @jcmaynard

I haven't seen the output of PaSER. Could you shoot me over some data for this? I can look into the converter if I have the data.

Devon

jcmaynard commented 10 months ago

Hi Devon,

I can send you some Prospector Output, one report with phospho and one with global proteins? Where is the best place to send it?

Cheers,

Jason

devonjkohler commented 10 months ago

Hey @jcmaynard,

Would you be able to share them via email: kohler.d@northeastern.edu

Devon

tonywu1999 commented 2 months ago

Hi @jcmaynard

I got started on writing the code for the protein prospector converter. Devon shared me your dataset, but I got a little confused on the data format.

jcmaynard commented 2 months ago

Hi @tonywu1999

Here is the manual for Protein Prospector (specifically the section on data output): https://prospector.ucsf.edu/prospector/html/instruct/batchtagman.htm#search_compare

The first two rows have some cells that represent the Project Name: "Z20180606_YvA_TotalRPLC", and the search name "SW201948rc2mc2mm".

The data report lists the Peaklists used for the search under the header "Fraction". The charge state is under column header "z".

In the case of TMT10 or TMTpro, the intensity headers will have the same name for example "Int 127" for both 127N and 127C. The N isotope will always be first. I'm in the process of trying to get the Prospector Admin to change this.

There are a number of different options for reporting peptide mods in Prospector, the data I shared was just one of them. Mods can be split out into separate columns if that would be easier to parse.

Jason

tonywu1999 commented 2 months ago

The data report lists the Peaklists used for the search under the header "Fraction". The charge state is under column header "z".

Understood - z represents the precursor charge.

The first two rows have some cells that represent the Project Name: "Z20180606_YvA_TotalRPLC", and the search name "SW201948rc2mc2mm".

I'm still confused on how to determine which run(s) produced this report. For example, if you see the attached example from another tool (MaxQuant), you can see that there's a column "Raw.file" that outlines which RAW file a particular row is associated with. But I can't seem to find that in the Prospector search file.

In the case of TMT10 or TMTpro, the intensity headers will have the same name for example "Int 127" for both 127N and 127C. The N isotope will always be first. I'm in the process of trying to get the Prospector Admin to change this.

Could you clarify what you mean by this? How would one know when a measurement is associated with the N isotope vs the C isotope based on inspecting the input dataset?

There are a number of different options for reporting peptide mods in Prospector, the data I shared was just one of them. Mods can be split out into separate columns if that would be easier to parse.

I think your initial dataset works well with how peptide mods are reported. Is there a certain setting that a user needs to select to display the mods in the current format (i.e. is this the default format)?

jcmaynard commented 2 months ago

Hi @tonywu1999,

Here is a breakdown of the column headers from the reports I sent @devonjkohler : Column Headers

The Reporter Ion Intensity Columns for TMT greater than 6 plex will now have an Isotope label, example: "Int 127N" and "Int 127C"

Modifications can be reported in 5 different ways:

For the above settings modifications are reported at the Peptide level. Oxidation@11 refers to the 11th amino acid of the peptide. Protein modification is a separate column discussed above. The default is "Variable Mods only", but "Mods in Peptide" or one of the all mods are used more often.

The TMT modification names in Prospector are: TMT6plex, TMT10plex, and TMT16plex

I'm happy to set up a zoom or call to discuss if that would be helpful.

Cheers,

Jason

tonywu1999 commented 1 month ago

@jcmaynard

Hi,

I'd be happy to discuss on a call. I think you answered all my questions but I'm curious on how the modifications are reported and would like more clarity on that.

Could you email me at wu.anthon@northeastern.edu and we can coordinate a time to discuss?

Thanks, Tony

tonywu1999 commented 2 weeks ago

@jcmaynard

In terms of timeline for the MSstatsPTM converter, I'm anticipating for it to be complete by end of October. I had initially thought it would be complete earlier, but I noticed the code for MSstatsPTM needs some refactoring before implementing the code for the protein prospector converter.

So far, I created the converter from Protein Prospector to MSstatsTMT format, which is accessible at MSstatsConvert.

jcmaynard commented 1 week ago

Thanks for the update Tony

On Tue, Sep 3, 2024 at 11:22 AM tonywu1999 @.***> wrote:

@jcmaynard https://github.com/jcmaynard

In terms of timeline for the MSstatsPTM converter, I'm anticipating for it to be complete by end of October. I had initially thought it would be complete earlier, but I noticed the code for MSstatsPTM needs some refactoring before implementing the code for the protein prospector converter.

So far, I created the converter from Protein Prospector to MSstatsTMT format, which is accessible at MSstatsConvert.

— Reply to this email directly, view it on GitHub https://github.com/Vitek-Lab/MSstatsPTM/issues/39#issuecomment-2327151935, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE7FMXFTUOZS7KAA63ALRUDZUX47HAVCNFSM6AAAAABKN4NCZKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRXGE2TCOJTGU . You are receiving this because you were mentioned.Message ID: @.***>

tonywu1999 commented 1 day ago

Adding a comment describing notes from the meeting between me and Jason here from July:

General Notes:

Slip Score Notes: