Accuracy - Githubissues

tobiasko commented 4 years ago

Dear IonQuant developers,

I cross compared the quantification results obtained by running

a) FragPipeGUI -> MSstats b) MQ -> MSstats

on a published PASEF dataset with a priori known sample ratios Meier et al. 2018 see Fig. 5d. In general, the results look really nice, BUT... my data suggests that IonQuant tends to systematically underestimate the expected fold changes (Ecoli 1:4, Hs: 1:1):

This becomes even clearer when analysing the residuals (estimated log2FC vs. expected). MQ residuals are pretty much centered on zero (as one would expect)

FragPipe residuals are shifted by half a log2 unit, too low in the mean.

I am now wondering if this effect could be explained by specific properties of the Meier et al. dataset, maybe in combination with parameter choices in FragPipe->MSstats? Your manuscript doesn't really touch the topic quantification accuracy and instead focusses on precision. Have you observed similar things when using ground truth datasets?

Greetings, Tobi

tobiasko commented 4 years ago

The raw files we analysed can be found at PXD010012.

anesvi commented 4 years ago

Tobi, Fengchao and I are looking into this. Will let you known when we have an answer for what is happening. Thanks Alexey

Sent from my iPhone

On Apr 22, 2020, at 5:08 AM, Tobias Kockmann notifications@github.com wrote:

External Email - Use Caution

The raw files we analysed can be found at PXD010012https://www.ebi.ac.uk/pride/archive/projects/PXD010012.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/IonQuant/issues/3#issuecomment-617653815, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM67NHPS4UGQPWU7EIK3RN2XZRANCNFSM4MNDQDJQ.

Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

tobiasko commented 4 years ago

Hi @prvst,

ok. Just let me know if you need something. log files, code... but you should be able to reproduce this without any special parameter choices.

Best, Tobi

tobiasko commented 4 years ago

Hi @anesvi , Hi @fcyu,

I got some feedback from Florian Meier. He pointed out that the preprint version of the paper contains a typo. The true ratio for Ecoli should be 1:5, not 1:4 as I stated above. The typo in the legend of Fig. 5 was corrected in the final version. Here is the results section:

"To further benchmark the quantitative accuracy of our setup, we mixed tryptic digests from HeLa and Escherichia coli in 1:1 and 1:5 ratios and measured each sample in quintuplicate 120 min single runs. Overall, we quantified 6135 protein groups (5407 HeLa; 728 E. coli) with at least one valid value for both mixing ratios. Plotting the median fold-changes yielded two distinct clouds for HeLa and E. coli proteins, which were 4.3-fold separated in abundance, slightly less than the intended 5-fold mixing ratio (Fig. 5D). Both populations were narrow (σ(HeLa) = 0.44; σ(E. coli) = 0.77) relative to the expected fold-change and they had minimal overlap. Considering only the 5686 proteins with at least two valid values for each mixing ratio (5052 HeLa, 634 E. coli), a one-sided Student's t test returned 602 significantly changing E. coli proteins at a permutation-based FDR below 0.05. This represents an excellent sensitivity of ∼95% and only 64 human proteins (1.3%) were falsely classified as changing. From these results, we conclude that the combination of TIMS and PASEF provides precise and accurate label-free protein quantification at a high level of data completeness."

But they used the distance between the median value for Hs and the median value of Ecoli instead of comparing to abs. expectations (analysing residuals). The reason is the nature of the 2plex hybride proteome, which most likely affects the assumptions used during normalisation. He pointed out that this is also visible in the maxLFQ paper:

I saw similar effects for the FragPipe->MSstats dataset. The Hs peptides are also not exactly centered on zero. So maybe one needs to use only Hs peptides during normalization in MSstats to get more accurate FC estimates.

Cheers, Tobi

tobiasko commented 4 years ago

An alternative dataset to access accuracy might be PXD014777. It is a LFQbench style triple hybride proteome. Have you tried this one? It was used to benchmark MQ 1.6.6

anesvi commented 4 years ago

Yes, we looked at that dataset. We see some strange things too. We are currently trying to understand some weird behavior of intensities in these data. Thanks, Alexey

From: Tobias Kockmann notifications@github.com Sent: Wednesday, April 22, 2020 11:11 AM To: Nesvilab/IonQuant IonQuant@noreply.github.com Cc: Nesvizhskii, Alexey nesvi@med.umich.edu; Mention mention@noreply.github.com Subject: Re: [Nesvilab/IonQuant] Accuracy (#3)

External Email - Use Caution

An alternative dataset to access accuracy might be PXD014777https://www.ebi.ac.uk/pride/archive/projects/PXD014777/private. It is a LFQbench style triple hybride proteome. Have you tried this one? It was used to benchmark MQ 1.6.6https://www.mcponline.org/content/early/2020/03/10/mcp.TIR119.001720

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/IonQuant/issues/3#issuecomment-617838856, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM67UPKELHVXKI2TUEXDRN4CGRANCNFSM4MNDQDJQ.

Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

tobiasko commented 4 years ago

Hi @anesvi, Hi @fcyu ,

I meanwhile managed to download and analyse the triple hybride data from PXD014777. This is what I see using a FragPipe->MSstats workflow followed by some R code for plotting the MSstats estimates (haven't used anything else):

log2FC distribution by kernel density estimator

MA plot incl. LOESS fit

Vulcano plot grouped by species

Best, Tobi

fcyu commented 4 years ago

Thanks Tobi,

We are also looking at this data, and might find the reasons and solutions. Will let you know when we have significant progress.

Best,

Fengchao

anesvi commented 4 years ago

Tobi We think there are issues with TimsTOF intensities that are hard to normalize. Do you know a similar Thermo data? Alexey

Sent from my iPhone

On Apr 28, 2020, at 10:13 AM, Fengchao notifications@github.com wrote:

External Email - Use Caution

Thanks Tobi,

We are also looking at this data, and might find the reasons and solutions. Will let you know when we have significant progress.

Best,

Fengchao

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/IonQuant/issues/3#issuecomment-620633885, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM62JNCEKRVJVP2DPXMDRO3QCJANCNFSM4MNDQDJQ.

Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

tobiasko commented 4 years ago

Hi @anesvi,

you mean a triple hybride proteome analyzed by LC-MS on an Orbitrap in DDA mode?

Best, Tobi

anesvi commented 4 years ago

Yes

Sent from my iPhone

On Apr 28, 2020, at 10:22 AM, Tobias Kockmann notifications@github.com wrote:

External Email - Use Caution

Hi @anesvihttps://github.com/anesvi,

you mean a triple hybride proteome analyzed by LC-MS on an Orbitrap in DDA mode?

Best, Tobi

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/IonQuant/issues/3#issuecomment-620638944, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM63HGORPS623HKUK4XLRO3RB7ANCNFSM4MNDQDJQ.

Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

tobiasko commented 4 years ago

We (FGCZ) did this to cross compare the HF-X with the timsTOF Pro. Unfortunately, we never got the raw data from the Bruker demo lab. But the HF-X data should be in our LIMS system.

anesvi commented 4 years ago

If you can pass HF-X data to Fengchao it would be helpful

Sent from my iPhone

On Apr 28, 2020, at 10:40 AM, Tobias Kockmann notifications@github.com wrote:

External Email - Use Caution

We (FGCZ) did this to cross compare the HF-X with the timsTOF Pro. Unfortunately, we never got the raw data from the Bruker demo lab. But the HF-X data should be in our LIMS system.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/IonQuant/issues/3#issuecomment-620649723, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM6Z5HDFOGOYWLGAC5VDRO3TGTANCNFSM4MNDQDJQ.

Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

jjGG commented 4 years ago

Hei @anesvi and @tobiasko, here is a recent publication that uses a similar dataset like the tripleProteome done on a Fusion instrument. https://www.mcponline.org/content/mcprot/early/2020/04/22/mcp.RA119.001624.full.pdf

The pride link is here: https://www.ebi.ac.uk/pride/archive/projects/PXD003881

Best regards- jonas

fcyu commented 4 years ago

Hi @tobiasko ,

Thanks for your help in advance. You may directly contact Alexey (nesvi@med.umich.edu) and me (yufe@umich.edu) if you want to share the data with us.

Best,

Fengchao

tobiasko commented 4 years ago

Sure! Let me check if I can find it.

fcyu commented 4 years ago

Hi @jjGG,

Thanks for your information. I am looking at it now.

Best,

Fengchao

fcyu commented 4 years ago

Hi @tobiasko ,

Thank you very much for your checking and testing. We thoroughly invested it, and found some bugs and issues in our program. After updating it to 1.1.0, it shows a good accuracy and a better precision (lower median of CV (coefficient of variation)). Following is the result from the three species data (PXD014777):

Best,

Fengchao

tobiasko commented 4 years ago

Nice work! You should incl. that in your IonQuant manuscript! We have meanwhile finished our backend integration and IonQuant is really stable. So far no issues at all! Can't say this for MQ and PASEF data! Looks like you are in the lead.

Best, Tobi

fcyu commented 4 years ago

Thanks Tobi.

We have included this experiments to the manuscript. Will be available after it publishes.

Best,

Fengchao

Nesvilab / IonQuant

Accuracy #3