MannLabs / directlfq

Fast and accurate label-free quantification for small and very large numbers of proteomes
https://doi.org/10.1101/2023.02.17.528962
Apache License 2.0
37 stars 4 forks source link

First QC results indicate reduced quantitative accuracy of directLFQ vs MaxLFQ #4

Closed michaelsteidel86 closed 1 year ago

michaelsteidel86 commented 1 year ago

Thanks for this very interesting and easily accessible work!

Unfortunately my first attempt to reprocess a mixed proteome standard (Human/Ecoli -> 1:1 vs1:3) processed via DIA-NN only with default options resulted in clearly reduced quantitative accuracy for directLFQ.

Is this actually to be expected?

Kind regards Michael

Capture

michaelsteidel86 commented 1 year ago

Seems to me that DirectLFQ extracts all reported fragment ion intensities from the DIA-NN report.

However - according to Vadims publication - the DIA-NN default precursor quantification algorithm uses only 3 fragments per precursor (those with highest correlation score (measure for goodness of fit of actual fragment elution profile vs reference profile)) to compute precursor intensities. Maybe this explains the observed discrepancy? Anyhow, this does actually not fit to data shown in the preprint..

ammarcsj commented 1 year ago

Hi Michael,

thanks a lot for checking out directLFQ and providing this helpful feedback! These differences look quite strong indeed. The ratio-comparison data shown in the paper was processed with Spectronaut and they furthermore come from a Thermo platform. Attached are some ratio comparison checks for DIA-NN on the paper dataset

1) using DIA-NN fragment.intensities diann_fragion_ratios

and

2) using the DIA-NN calculated precursors for the for protein intensity estimation: diann_precursor_ratios

to me version 1) looks a bit better on this data,. So my guess would be that it depends on the platform on which the data was acquired. In case you can share: which platform was your data acquired on?

I have now released a new directLFQ version 0.2.5 , where I set 2) to be the default option, as I guess this will be more stable for different types of data. I also added more options in the GUI (see image below). It would be really interesting to see your results with the new default (or also non-defaults if you have the time to check this)

Screenshot 2023-02-22 at 15 51 47
michaelsteidel86 commented 1 year ago

It is diaPASEF data. Already have checked your algorithm on this platform?

michaelsteidel86 commented 1 year ago

Hi Constantin.

Indeed from 1st glimpse the platform seems to have a big impact on resulting quantities...

Having a look into the DIA-NN paper I took note that actually only Top3 fragment ions showing highest correlation score are used for default precursor quan calculation. Quickly scripted a solution to filter report.tsv for top3 fragment ions and re-ran directLFQ GUI (previous version). Did not check in much detail, but doesn't look that bad form 1st glimpse...

Note: diaPASEF data from timsTOFpro 2 Capture

ammarcsj commented 1 year ago

This is very interesting, thanks a lot for checking it out! So your filtering script went through the Fragment.Quant.Corrected column and did a transformation like below (assuming they are sorted by correlation)?

379251;227167;504371;756432;0;854488;955054 -> 379251;227167;504371

The standard deviations look quite good, a reason could be that directLFQ also uses the MS1 information in this case, which is quite valuable for timsTOF data in my experience.

I have added this approach to the feature list and will look more into it. Hope it also looks ok with the new version

michaelsteidel86 commented 1 year ago

In principle I did exactly this, but computed sum of scores across all Runs as I wanted underlying fragments to be same across considered runs. Also happy to check a samplewise top3

ammarcsj commented 1 year ago

Thanks a lot for explaining, I think it should definitely be stable over all samples, as you implemented it.

ammarcsj commented 1 year ago

Hi Michael, I have now tried 4 different processing types on data from the original diaPASEF paper, including the top3 method you suggested. There are only subtle differences between the processing types. The differences are considerably smaller as compared to the data that you show, which is strange.

Screenshot 2023-02-24 at 16 14 01

Could you maybe check how the default in the new version performs for you?

ammarcsj commented 1 year ago

Hi Michael, as we the main issue seems to be resolved by now, I will close this discussion for now. I will let you know in case there are updates. Thanks again for the input!