Nesvilab / FragPipe

A cross-platform Graphical User Interface (GUI) for running MSFragger and Philosopher - powered pipeline for comprehensive analysis of shotgun proteomics data
http://fragpipe.nesvilab.org
Other
175 stars 37 forks source link

Glycoproteomics_quantifications_0_intensity_values #812

Closed adamurminsky closed 3 months ago

adamurminsky commented 1 year ago

Hello MSFrager Team,

I have been trying to analyze my glycoproteomic data (orbitrap) with MSFrager as the identifications and their confidence looks better than other search engines. But I am not able to get intensity values from IonQuant.

My goal is to quantitatively compare several groups of samples on the glycopeptide level. In other engines such as Byonic, modified peptide with glycan is considered as unique and has its own intensity value, therefore can be further analysed for differential expression between groups in R. From what I understood, in MSfrager can be done something similiar on the peptide level, where in output of IonQuant - such as combined_modified_peptide.tsv you have unique peptide and the glycan modification is the mass offset in column modified sequnce and then you can find what is the modification (glycan) in the global.modsummary file from PTM-shepherd.

So, I am not sure why I am not getting the intensity values, I am running newest versions of FragPipe, MSFrager, Philosopher, IonQuant. I used basic workflow - glyco-N-LFQ, but my data are hybrid - combination of hcd and EThcD fragmentation spectra. I also added mass_offsets which are masses of glycans typical for blood serum and breast cancer but i did not import glycan database in PTMs column in Fragpipe as those are the same masses as defined in mass offsets. Another question is regarding the global.modsummary file, where there are a lot of unannotated mass-shift, which were found in hundreds/ thousands of PSMs, I thought that this glycan mass would be solved in such high number of PSMs. image

I converted the .raw Thermo files to .mzml, just with centroid filter and tried to run the analysis also with and without zlib compression, but both were unsucessful in getting intensity values. image

I read the forum and checked that I am not running know bugs such as "print decoys" in validation, but it is probably still something trivial that I am missing.

Here is the log from one of my attempts:

log_2022-08-26_00-09-32.txt

I am not able to share msfrager.params in this post, but I can send it somewhere if needed, as well as shepherd params.

I would be very grateful if you could confirm my thoughts about quantification of glycopeptides in Fragpipe and help me configure fragpipe to eventually get the intensity values for further analysis.

Thank you,

Adam

fcyu commented 1 year ago

I can't figure out why IonQuant didn't quantify anything. Maybe has something to do with the psm.tsv files rewritten by PTM-Shepherd. Dan @dpolasky, can you take a look?

Best,

Fengchao

dpolasky commented 1 year ago

Hi Adam,

I'm also not sure why you're not seeing intensities, but have a couple of thoughts from the log. First, to clarify, PTM-Shepherd will actually write identified glycans to the Assigned Modifications column of the psm.tsv tables (and remove the delta mass/mass offset) so that IonQuant can read them, so you should expect to see the glycans in the Assigned Modifications column of the combined_modified_peptide.tsv from IonQuant (with intensities from each experiment). The global.modsummary file from PTM-Shepherd (somewhat confusingly) does not have the same annotations as the final glycans assigned to the PSM tables, and so many of the masses will not be annotated there, as you've seen - but that will not affect the quantitation. The best place to look for the identified glycans is currently in the psm.tsv tables, which will have the glycan mass and location in the Assigned Modifications column and the glycan composition in the Observed Modifications column. The combined_modified_peptide.tsv table from IonQuant will also have the same mass and location in its Assigned Modifications column, but it doesn't carry up the composition information.

As for not seeing intensities, I'm not able to reproduce your issue with hybrid data here, but have one possible idea. It looks your mass offset list in the MSFragger search doesn't include 0, which means that only glycopeptides are identified (no unmodified peptides), which can cause problems for calibration and PTM-Shepherd - could you try re-running with 0 included in the list of mass offsets? If that still gives no intensities in the psm.tsv tables, could you share a psm.tsv and combined_modified_peptide.tsv file from that search?

I also see you're running PTMProphet, but it will not localize glycans, so you can turn it off (unless you want to localize the Met oxidation variable mod). I don't think that would prevent IonQuant from finding intensities, but turning it off will speed up the search a lot.

Best, Dan

adamurminsky commented 1 year ago

Thank you Dan,

I added the 0 to the mass offsets and I have intensity values. I deleted default glyco mass offsets and replaced them with my own from glycan library and forgot to add 0. I suppose I have everything I needed for further analysis.

Thanks,

Adam

fcyu commented 1 year ago

Hi Adam,

Can you help to check if the glycan peptides have nonzero intensities?

Thanks,

Fengchao

adamurminsky commented 1 year ago

Hi Fengchao,

yes, I checked individual psm.tsv files and the peptides with glycans (composition) in "Observed modification" column have nonzero intensities. Also in the combined_modified_peptide.tsv there are nonzero intensities. There are some zero intensities but its expected as not all glycopeptides can be found in all samples. I attached screenshot. image

Thanks,

Adam

adamurminsky commented 1 year ago

Hello Fengchao and Dan,

I would like to ask one more thing. Is it possible to load "glyco-N-Hybrid" workflow in fragpipe and turn on the IonQuant? I have a combination of HCD and EThcD data. Basically, you have a MS1 scan where the precursor is fragmented with HCD, and MS2 is measured, if in this MS2 is present any glyco trigger mass, another EThcD MS2 is measured with the same precursor. Correct me if I misunderstand this: For example, if a glycopeptide is identified both with HCD and EThcD, PSMs are from the same precursor - which should be assigned to a specific feature, that is quantified. How this quant value is dealt with? Because if I run this quantification, a huge amount of glycopeptides that are identified are not quantified in any of the 80 samples, so I wonder if these are from this hybrid fragmentation.

As an example, I can show you this histogram where the x-axis is the percentage of samples in which was the glycopeptide quantified: image

As you can see the majority of glycopeptides are not quantified in the samples at all. For comparison here is the same histogram from PD with byonic node: image

These two histograms are from exactly the same data, but I wanted to use MSFrager for more IDs, but have a problem quantifying this type of hybrid data.

Here is the log from the MSFrager analysis log_2022-10-18_14-47-16.txt

So I would like to ask whether I can quantify this hybrid data, or which parameters in fragpipe are wrong.

Thank you,

Adam

fcyu commented 1 year ago

Hi Adam,

With the latest PTM-Shepherd and write the glycan as variable modification should work. I will let Dan @dpolasky to answer the details regarding the glycan assignment ;)

Best,

Fengchao

dpolasky commented 1 year ago

Hi Adam, We don't combine the HCD and EThcD scans - each is searched and quantified independently. But there's nothing that (should) prevent scans that are matched to a glycopeptide from being quantified - each of the paired scans will point to the same precursor, and so they should have the same intensity reported if they were matched to the same glycopeptide. It looks like your settings should be okay for quant, so I'm not sure why you're not seeing glycopeptides being quantified across many samples. Is the issue that a lot of the glycoPSMs in the psm.tsv tables have 0 intensity detected, or that different peptides are getting detected in different runs?

Best, Dan

adamurminsky commented 1 year ago

Hello both,

Dan, regarding your question - the first part: in psm.tsv of A4 file from 4500 glycopeptides around 50% have intensity values (I suppose it is approximately the same in all other samples) but I am unable to check for the type of fragmentation of those glycopeptide spectra (whether those missing are from EThcD or if the glycopeptide was found with both HCD&EThcD) as there is no specific column describing fragmentation? (it is possible to check manually the spectra but it would take time). Regarding the second part of the question, it is quite heterogeneous as is expected in DDA glycopeptide spectra, to assume how heterogeneous I made the simplest Venn Diagram of only (disregarding charge, other modifications etc) Modified.Peptide column between A6 psm.tsv and A4 psm.tsv: image In IonQuant I tried to edit the MBR RT tolerance to 10 minutes (same as PD default) to get more intensity values across samples, but no results.

Best,

Adam

dpolasky commented 1 year ago

The numbers seem pretty reasonable as you say - I'm not sure why the MSFragger histogram looks so bad given how the smaller comparison looks. What is going into that histogram (which table/columns)?

adamurminsky commented 1 year ago

Regarding the MSFrager histogram

I uploaded the combined_modified_peptide.tsv to R - I will explain only the histogram I uploaded but it is the same for all of the groups. Then added two columns "B_group_count" and "Bgroup%". Then I counted for all non-zero values in the B group (15 samples) for each glycopeptide. So for example, if one glycopeptide was non-zero(quantified) in 10 samples, in the column "B_group_count" it will be 10. Then all of the numbers in this "B_group_count" are divided by the number of samples in this group (15 - but also ×100 because I wanted %, so it's 10/15×100) and saved to "Bgroup%. Basically, you get the presence of each glycopeptide as a percentage in this B group. And then from these percentages, I made the histogram I send in previous messages. I did this for all 5 groups the same way.

What is weird is that a huge amount of glycopeptides (rows) were not quantified in any of the samples as you can see in the second row: image

Thanks for you time,

Adam

dpolasky commented 1 year ago

Thanks, that helps a lot. I tested with some hybrid data here to see if I could get something similar. The dataset I used for testing is fractionated - if I test a bunch of different fractions, I can get really low overlap from the combined_modified_peptide.tsv across all samples like you're seeing. But if I test replicates of the same or neighboring fractions, the overlap in modified peptides found across all samples is very high. So I'm not seeing any issue with reliably quantifying hybrid data, at least with the one set I tested. It's hard to guess too much more about what might be going on without knowing more about your data and whether it makes sense to see the majority of peptide-glycan combinations in all samples - maybe you can email me directly (dpolasky [at] umich.edu) with some info about the samples and the MSFragger and Byonic tables

danielgeiszler commented 3 months ago

Is this resolved?