Nesvilab / FragPipe

A cross-platform proteomics data analysis suite
http://fragpipe.nesvilab.org
Other
184 stars 37 forks source link

DIA pipeline quantifies peptides at wrong retention time #1233

Open tomvaiuw opened 1 year ago

tomvaiuw commented 1 year ago

Hi, We are running some data through DIA-SpecLib-Quant pipeline and we find that several peptides are quantified at wrong retention time (we check the chromatograms in Skyline) as shown in this correlation: image

x-axis Skyline manually curated true retention time, y-axis - retention time reported in the diann-output.tsv For example the major outlier (16min vs 20.5 min peak) - peptide LSPLGEEMR shows no peak at the 20.5 min - image

I am not sure how to explain this and correct it.

Thanks, Tomas University of Washington

anesvi commented 1 year ago

I see you built the library from GPF files. Can you check that peptide in the corresponding GPF file. The retention time in the library comes from the MS2 scan that gave the best identification score. So the GPF file should have a strong signal at that retention time. Why is it missing at that RT in the DIA-Quant file, and observed in a different part of the chromatograph, I don't know. You can also try to annotate quant files as DIA, not DIA-quant. That way all files will be used for spectral library building and maybe you will get it with the right retention time. Alexey

tomvaiuw commented 1 year ago

It is a major peak at 16 min in the GPF. Actually it comes up as 2+ in one and 1+ in another both at that retention time. But it is one of very major peptides and therefore it is picked up for quite a bit. We had issues when setting the quant file as DIA so went to DIA-quant. Which file is has all the MS2 identification scores along with Rt for the GPFs? Thanks, Tomas

Excuse typos, writing from a smart phone.


From: Alexey Nesvizhskii @.> Sent: Wednesday, August 30, 2023 4:38:46 PM To: Nesvilab/FragPipe @.> Cc: Tomas Vaisar @.>; Author @.> Subject: Re: [Nesvilab/FragPipe] DIA pipeline quantifies peptides at wrong retention time (Issue #1233)

I see you built the library from GPF files. Can you check that peptide in the corresponding GPF file. The retention time in the library comes from the MS2 scan that gave the best identification score. So the GPF file should have a strong signal at that retention time. Why is it missing at that RT in the DIA-Quant file, and observed in a different part of the chromatograph, I don't know. You can also try to annotate quant files as DIA, not DIA-quant. That way all files will be used for spectral library building and maybe you will get it with the right retention time. Alexey

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/Nesvilab/FragPipe/issues/1233*issuecomment-1699989892__;Iw!!K-Hz7m0Vt54!mlGX1Hx4bcUhNgtJ4-GJWVJrGt8yFaU-049mmAYDUtuXQJmBBKV9y00rhbDzdSR5nNtDEGuRNCzxmSA_Jj8M3Og$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AYEGTBGLKF5IXI2M7MB6QLTXX7FILANCNFSM6AAAAAA4FDMXNM__;!!K-Hz7m0Vt54!mlGX1Hx4bcUhNgtJ4-GJWVJrGt8yFaU-049mmAYDUtuXQJmBBKV9y00rhbDzdSR5nNtDEGuRNCzxmSA_3h74RrA$. You are receiving this because you authored the thread.Message ID: @.***>

anesvi commented 1 year ago

You should check PSM.tsv file. I will list all scans where the peptide was identified in GPF runs and with what scores. I believe the best probability one sets the RT shown in the library.

tomvaiuw commented 1 year ago

There is quite a few of the scans and they do extend all the way to the incorrect retention time. I was checking it earlier. I will have to check the probabilities but I am pretty sure they are mostly close to 1.000 so I suspect it is combination of high abundance and play of probabilities that differ only at x-the decimal. There are other highly abundant peptides that also get detected over extended rt range, but do not have this issue. So there is no attempt to find apex to assign the RT in the GPF processing?

Tomas

Excuse typos, writing from a smart phone.


From: Alexey Nesvizhskii @.> Sent: Wednesday, August 30, 2023 5:21:15 PM To: Nesvilab/FragPipe @.> Cc: Tomas Vaisar @.>; Author @.> Subject: Re: [Nesvilab/FragPipe] DIA pipeline quantifies peptides at wrong retention time (Issue #1233)

You should check PSM.tsv file. I will list all scans where the peptide was identified in GPF runs and with what scores. I believe the best probability one sets the RT shown in the library.

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/Nesvilab/FragPipe/issues/1233*issuecomment-1700062222__;Iw!!K-Hz7m0Vt54!j70bhd_KMpRk5hGnGTL9XAIE5zfPQz3XuPqezRNjKz9OvX_NFnS_ExGYjV1lXVKRNmOAghkV6Il-10J5u1BDCeU$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AYEGTBDY3NPALHV7VHJ3TD3XX7KHXANCNFSM6AAAAAA4FDMXNM__;!!K-Hz7m0Vt54!j70bhd_KMpRk5hGnGTL9XAIE5zfPQz3XuPqezRNjKz9OvX_NFnS_ExGYjV1lXVKRNmOAghkV6Il-10J5bEqxjb8$. You are receiving this because you authored the thread.Message ID: @.***>

anesvi commented 1 year ago

EasyPQP that builds the library does not trace peaks to determine the apex Rt. Yes, a better logic for selecting the RT would be helpful. Also remember that the precursor maybe identified in multiple files, so even if we reset the RT value to the Apex (which we can possibly do in MSFragger),there are still complications. We can discuss more with you if you have some suggestions, but it would not be an immediate fix. I hope there are just a few special cases like this.

tomvaiuw commented 1 year ago

Alexey, We’ve been poking around a bit and since the differences in RT in the library to RT in actual chromatographic peak were few (~15 out of 350 high scoring peptides), we manually modified the library.tsv and then ran only the DiaNN through the Fragpipe. We’ve seen some changes in the integrated RT though after this, which we are looking into right now. But for comparison we ran the same data and library through a standalone DiaNN and got somewhat different results. Not sure why – looking into it right now as well.

For better way to pick the correct RT – perhaps rolling average of the identification scores (average 3 scans)? Weighing identification scores (above certain value like – 0.95 or even better) by the MS2 intensity (or the other way around – weight the score by the intensity)? Other more elaborate ways I guess would require to actually look for true maximum/apex.

Thanks a lot for your help. Tomas

From: Alexey Nesvizhskii @.> Sent: Wednesday, August 30, 2023 5:40 PM To: Nesvilab/FragPipe @.> Cc: Tomas Vaisar @.>; Author @.> Subject: Re: [Nesvilab/FragPipe] DIA pipeline quantifies peptides at wrong retention time (Issue #1233)

EasyPQP that builds the library does not trace peaks to determine the apex Rt. Yes, a better logic for selecting the RT would be helpful. Also remember that the precursor maybe identified in multiple files, so even if we reset the RT value to the Apex (which we can possibly do in MSFragger),there are still complications. We can discuss more with you if you have some suggestions, but it would not be an immediate fix. I hope there are just a few special cases like this.

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https:/github.com/Nesvilab/FragPipe/issues/1233*issuecomment-1700098769__;Iw!!K-Hz7m0Vt54!nUaLvksi29xOFrVSoD8T4T5XU9s8aNdQYbhmk8zAJvgZTukwGar_thK9u6TjERdbGWqipqR3RXFk-TYg9jPWapk$, or unsubscribehttps://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AYEGTBALB3SCIPSRDFX4IX3XX7MN3ANCNFSM6AAAAAA4FDMXNM__;!!K-Hz7m0Vt54!nUaLvksi29xOFrVSoD8T4T5XU9s8aNdQYbhmk8zAJvgZTukwGar_thK9u6TjERdbGWqipqR3RXFk-TYg1xeUfUU$. You are receiving this because you authored the thread.Message ID: @.***>

anesvi commented 1 year ago

Thanks! We will look into changing EasyPQP to make reported RT values better

tomvaiuw commented 11 months ago

Alexey, To give you couple examples of what we are seeing I attach a powerpoint with some data. We imported Fragpipe generated library.tsv from GPF search into skyline (as a transition list) and then uploaded into Skyline the same data the library was generated from. Examples of 3 different peptides show that we even find different retention times for different charge states of the same peptide and while in some cases there is evidence that the peptide might be at RT in library, in other cases we see no reason for find that peptide/charge state at the RT indicated in the library. Hope this helps.

Best regards, Tomas

From: Alexey Nesvizhskii @.> Sent: Tuesday, September 12, 2023 2:28 PM To: Nesvilab/FragPipe @.> Cc: Tomas Vaisar @.>; Author @.> Subject: Re: [Nesvilab/FragPipe] DIA pipeline quantifies peptides at wrong retention time (Issue #1233)

Thanks! We will look into changing EasyPQP to make reported RT values better

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https:/github.com/Nesvilab/FragPipe/issues/1233*issuecomment-1716475197__;Iw!!K-Hz7m0Vt54!l0Tv9iGRO5yVaDZ1kwWJl0p3P0-g5oQUzIpsJCSiz96DGRUPctG5iIk3kUbWRH7ITe8v7ApAf_vPaA5gUY9G2Bg$, or unsubscribehttps://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AYEGTBEC2KTDMBEVFGSDNZTX2DHV3ANCNFSM6AAAAAA4FDMXNM__;!!K-Hz7m0Vt54!l0Tv9iGRO5yVaDZ1kwWJl0p3P0-g5oQUzIpsJCSiz96DGRUPctG5iIk3kUbWRH7ITe8v7ApAf_vPaA5gGyFnJUA$. You are receiving this because you authored the thread.Message ID: @.***>