Nesvilab / FragPipe

A cross-platform Graphical User Interface (GUI) for running MSFragger and Philosopher - powered pipeline for comprehensive analysis of shotgun proteomics data
http://fragpipe.nesvilab.org
Other
175 stars 37 forks source link

Quesions for glycan workflows #526

Closed MNTsnowman closed 1 year ago

MNTsnowman commented 2 years ago

Hi Fragpipe

I’m unsure of where to ask this question, so I’ll drop it here… If you prefer to take it by mail or something else, let me know and we can do that. I have been wanting to explore the glycoproteome of my samples, but I have no clue how to do it effectively and how to interpret the output. In Fragpipe V 17 I noted that the PTM tab is changed quite a lot with stuff that I think is relevant for this topic. I read your guide (https://fragpipe.nesvilab.org/docs/tutorial_glyco.html) and I still have couple of questions. Below, when I refer to a section I refer to the numbers in the picture of the “PTMs” tab on the link above.

• How can I generate a .glyc file, could we get an example? Is it a tab-separated txt format with a header saying “glycanname hex galnax NeuAC” followed by lined describing these, like “glycan1 1 1 1”? • If I use such a .glyc file, what happens to/with the mass offsets entered at the MSFragger tab, are they considered? • What workflow would be compatible with the use of a .glyc file? • Would it be possible to make a general glyco workflow, where you can distinguish between O and N linked glycans by either enabling/disabling the “N-glycan Mode” in section 3 (of the link). If enabled, it is N-linked, if disabled it is O-linked. Also, one would need to change the AAs for localization in section 1 (again the link) and to switch to a relevant .glyc file. • Would it be possible to quantify these using IonQuant? Alternatively, could they be collapses into either all glycans of the same type throughout the dataset, or/and all glycans at the same position throughout the dataset? • Regarding section 4 (the link) how would this act on/with CID data? The “gap-masses” should be there, some of the breakdown products might be present as well, maybe even attached. • Last, and probably most importantly, how am I to interpret the output? So fare all I have is a global.glycoprofile.txt file. This is not really what I’m looking for, I would love for the identified glycans to be included in the general search so they could be treated as any other PTM and written in the output as such (preferably in the peptide.tsv and related output like MSstats ect.).

So, first of all thank you so much for your great work. I would love to be able to fully utilize it, but I really need some more information on how to do so and how to interpret the results. An example would be great. So, in summary, an example of a .glyc file would be great. Knowing if this can be used for both N- and O-linked glycans would be great. What workflows should be used for this, and how should I handle the MSFragger tab. Ideally, I would like for the glycans to be reported as all other PTMs included, this would also make the interpretation a lot easier. Lastly the oxonium ions, how to use these in CID, the description above describes these for HCD and that scores/penalty are involved, thus I'm wondering how this would act in a CID.

Kind Regards Martin

anesvi commented 2 years ago

I think Dan wrote a tutorial on that part. If not, Dan – we should put something together, see below

From: MNTsnowman @.> Sent: Saturday, November 13, 2021 2:20 PM To: Nesvilab/FragPipe @.> Cc: Subscribed @.***> Subject: [Nesvilab/FragPipe] Quesions for glycan workflows (Issue #526)

External Email - Use Caution

Hi Fragpipe

I’m unsure of where to ask this question, so I’ll drop it here… If you prefer to take it by mail or something else, let me know and we can do that. I have been wanting to explore the glycoproteome of my samples, but I have no clue how to do it effectively and how to interpret the output. In Fragpipe V 17 I noted that the PTM tab is changed quite a lot with stuff that I think is relevant for this topic. I read your guide (https://fragpipe.nesvilab.org/docs/tutorial_glyco.html) and I still have couple of questions. Below, when I refer to a section I refer to the numbers in the picture of the “PTMs” tab on the link above.

• How can I generate a .glyc file, could we get an example? Is it a tab-separated txt format with a header saying “glycanname hex galnax NeuAC” followed by lined describing these, like “glycan1 1 1 1”? • If I use such a .glyc file, what happens to/with the mass offsets entered at the MSFragger tab, are they considered? • What workflow would be compatible with the use of a .glyc file? • Would it be possible to make a general glyco workflow, where you can distinguish between O and N linked glycans by either enabling/disabling the “N-glycan Mode” in section 3 (of the link). If enabled, it is N-linked, if disabled it is O-linked. Also, one would need to change the AAs for localization in section 1 (again the link) and to switch to a relevant .glyc file. • Would it be possible to quantify these using IonQuant? Alternatively, could they be collapses into either all glycans of the same type throughout the dataset, or/and all glycans at the same position throughout the dataset? • Regarding section 4 (the link) how would this act on/with CID data? The “gap-masses” should be there, some of the breakdown products might be present as well, maybe even attached. • Last, and probably most importantly, how am I to interpret the output? So fare all I have is a global.glycoprofile.txt file. This is not really what I’m looking for, I would love for the identified glycans to be included in the general search so they could be treated as any other PTM and written in the output as such (preferably in the peptide.tsv and related output like MSstats ect.).

So, first of all thank you so much for your great work. I would love to be able to fully utilize it, but I really need some more information on how to do so and how to interpret the results. An example would be great. So, in summary, an example of a .glyc file would be great. Knowing if this can be used for both N- and O-linked glycans would be great. What workflows should be used for this, and how should I handle the MSFragger tab. Ideally, I would like for the glycans to be reported as all other PTMs included, this would also make the interpretation a lot easier. Lastly the oxonium ions, how to use these in CID, the description above describes these for HCD and that scores/penalty are involved, thus I'm wondering how this would act in a CID.

Kind Regards Martin

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/526, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM64PYLZRSRGQOJRELZTUL224TANCNFSM5H66H53Q. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

MNTsnowman commented 2 years ago

Hi Alexey

Thank you for the quick reply. I have not been able to find anything that clarifies these questions. There might be something i have missed though, if so, please let me know and i'll have a look.

Kind Regards Martin

dpolasky commented 2 years ago

Hi Martin, Lots of great questions - our glyco tutorials are still a work in progress, so thanks for pointing out some areas where we need more information. I'll be adding the all the answers/info provided here to the wiki/tutorial when I have a moment.

• How can I generate a *.glyc file, could we get an example? Is it a tab-separated txt format with a header saying “glycanname hex galnax NeuAC” followed by lined describing these, like “glycan1 1 1 1”?

The *.glyc file is a text file with one glycan composition per line (for example 'HexNAc-2_Hex-5_NeuAc-1_Fuc-1'), like the attached example (which is saved as '.glyc.txt' so Github will let me post it here, normally the extension would just be '.glyc'). I will definitely put up an example in the tutorials and more details on how to make it. If there are additional monosaccharides or other modifiers you would like to include in your search that aren't supported, definitely let us know! Currently we support HexNAc, Hex (hexose), Fuc (deoxy hexose), NeuAc, NeuGc, Phospho (phosphate), and Sulfo (sulfate).

• If I use such a *.glyc file, what happens to/with the mass offsets entered at the MSFragger tab, are they considered?

Great question - right now, MSFragger and PTM-Shepherd are NOT communicating about the mass offsets/glycans being considered. This gives some additional flexibility (i.e., you can search for a smaller glycan list to improve sensitivity in MSFragger, but use a larger list in PTM-Shepherd to improve match quality), but it is not particularly convenient. I am planning to add some sort of method to load a glycan database once and use it for the entire workflow, but haven't had a chance yet (also if you have suggestions/preferences about what kinds of external glycan database formats it would be good to support, let us know!)

• What workflow would be compatible with the use of a *.glyc file?

The *.glyc file only applies to the glycan assignment portion of PTM-Shepherd (which is taking the mass offsets from MSFragger and matching them to a glycan composition), so it is only needed for workflows that have the "Assign Glycans with FDR" box checked on the PTM-Shepherd tab (next to section 3). If no .glyc file is specified, PTM-Shepherd will use its internal glycan list, which should be fine for mammalian N-glycans. Any other workflows for non-glyco searches, open searches, etc, do not need the .glyc file, even if running other parts of PTM-Shepherd.

• Would it be possible to make a general glyco workflow, where you can distinguish between O and N linked glycans by either enabling/disabling the “N-glycan Mode” in section 3 (of the link). If enabled, it is N-linked, if disabled it is O-linked. Also, one would need to change the AAs for localization in section 1 (again the link) and to switch to a relevant *.glyc file.

Switching between N- and O-glyco workflows typically requires changing several parameters on both the MSFragger and PTM-Shepherd tabs (depending on the sample and acquisition settings), so we have opted to make separate base workflows for each rather than a general one that can be switched back and forth. If you have a specific use case in mind though, we'd be happy to work on making something that would be more convenient - maybe send me an email directly (dpolasky [at] umich.edu)

• Would it be possible to quantify these using IonQuant? Alternatively, could they be collapses into either all glycans of the same type throughout the dataset, or/and all glycans at the same position throughout the dataset?

Yes! IonQuant supports glycan quantification as of this release. It does require using some specific settings in PTM-Shepherd to convert the output to a format IonQuant can use, but we have a workflow called "glyco-N-LFQ" that uses IonQuant for quantification that you can use as a starting point. We have mostly been testing with TMT quant so far, so if you run into any issues, please let us know!

• Regarding section 4 (the link) how would this act on/with CID data? The “gap-masses” should be there, some of the breakdown products might be present as well, maybe even attached.

These parameters are all optimized for CID/HCD data at the moment, so they should generally work well for CID data (although if you're on an Orbitrap instrument, we only support high-resolution MS2 for glyco searches). We have seen that data taken at very high CID/HCD energy does not always perform well because the glycan is too fragmented to generate a good Y-ion series, but other than that, all the collisional activation datasets we've tested have worked well.

• Last, and probably most importantly, how am I to interpret the output? So fare all I have is a global.glycoprofile.txt file. This is not really what I’m looking for, I would love for the identified glycans to be included in the general search so they could be treated as any other PTM and written in the output as such (preferably in the peptide.tsv and related output like MSstats ect.).

The glycan output is written back to the psm.tsv table at the end of the PTM-Shepherd analysis, so it is reported with the general search results like any other PTM. You should be see the determined glycan in the "Observed Modifications" column as a glycan composition (like "HexNAc-2_Hex-5"), and in the Assigned Modifications column as a mass (like "4N(1216.4228)"). We have not yet implemented summarizing the glycans (or any open/mass offset results) up to the peptide/protein/etc tsv tables, but that is something we would like to do.

If you have any other questions or if any of this isn't clear, please let us know! Best, Dan example.glyc.txt

MNTsnowman commented 2 years ago

Hi Dan

Thank you so much for the details. I will give it a try later this week and let you know what happens.

Regarding the format of the .glyc file, i think what you have is duable. However, if you want to make it easier/more user friendly, maybe consider to make something that can be easily made in excel (I ashume that most people that use Fragpipe knows basic excel). This could then be saved as a tab separated *.txt file. Okay, I know i'm biased here since this is the system i'm used to (...). On the plus side, it is very easy to read and to manipulate in excel. I have attached an example of the two.

GlycExample.xlsx GlycExample.txt

Yesterday I had a look in some old results where i tested different things, in here I noted that the glycan masses were observed in the "HexNAc-2_Hex-5" format you described. This was noted both in the psm.tsv and the peptide.tsv files, however the peptides with these had 0 intensity. Maybe i need to have a look at the workflows used here, but are they supposed to be reported with 0 intensity (ashuming that the glyco-N-LFQ workflow is used)? I'll pay attention to this in my next attempt.

Good to know that these settings are optimized for CID and HCD. I know a CID is not optimal for this kind of work, however a CID is very common. So i guess we all have to make due with what we have?

I'll let you know how my next few searches goes.

Best Martin

dpolasky commented 2 years ago

Hi Martin, I actually do make my .glyc files in Excel, but transfer it to a text editor to save - you're right that skipping that step would make it a lot easier. We'll figure out a better solution overall for the next update - there's a lot of inconveniences in the current glycan loading methods.

I also see a lot of 0 intensity for glycopeptides, though there are some in my analysis that have intensities reported. Given how few glycopeptides are reporting a non-zero intensity compared to other peptides, there may be an issue there - I'll look into it and get back to you.

As for CID/HCD, I actually think CID is great for N-glycopeptides (especially if you can combine a couple of different collision energies) since it's so much faster than ETD/hybrid and generates a lot of useful N-glycan fragments. It is not ideal for O-glycans though - it's often almost impossible to localize them using CID data alone, but we can at least identify the peptide and total glycan mass with good confidence.

Best, Dan

MNTsnowman commented 2 years ago

Hi Dan

Thanks a lot, I am really looking forward to see the "final" glycan search strategy. However, there is one aspect that i thought that it would be great to include in the glycan search, but it is kinda deficult. Sometimes, glycans are modified further, I don't know all of the option, but two example are methylations and a loss of an OH group. So, now we are talking modifications on modifications (PTMs of PTMs...). I guess this is where things become quite, ehh, tricky? But if there are any way to accomodate this kind of information, and maybe even tell which monosacharid is modified and where in the structure it could be located, this could be worth including.

Best Martin

dpolasky commented 2 years ago

Hi Martin, We can probably add glycan modifications to the composition assignment process in PTM-Shepherd without too much trouble (especially if you know of good datasets for testing that would have a large number of these modifications). Assigning the modifications to specific places in the structure would likely be very challenging - probably not something we can do now, but maybe in the future. Best, Dan

MNTsnowman commented 2 years ago

Hi Dan Sounds interesting, very interesting. However, at this point i don't think that i have a great dataset for testing this (...), atleast not that i know of. Aslo, as i said, i know there are many of such modifications, also a lot more than what i just mentioned. Sorry, but i don't have an overview of the topic, i have only encountered a few which is why i'm aware of their presence. I'm sorry that i can't offer more in this area even though i brought it up, i'm simply curious since i have observed some and would like to explore it further. Best Martin