MannLabs / directlfq

Fast and accurate label-free quantification for small and very large numbers of proteomes
https://doi.org/10.1101/2023.02.17.528962
Apache License 2.0
37 stars 4 forks source link

directLFQ fails to apply grouping in v0.2.17 #30

Closed GeorgWa closed 6 months ago

GeorgWa commented 6 months ago

Describe the bug It looks like directLFQ ignores the grouping variable in the most recent version. Instead of 10,330 protein groups, all 174,000 fragments are handled like individual proteins and no output is generated.

Broken output 0.2.17

2024-02-15 22:21:38> ================ Protein FDR =================
2024-02-15 22:21:38> Unique protein groups in output
2024-02-15 22:21:38>   1% protein FDR: 10,330
2024-02-15 22:21:38> 
2024-02-15 22:21:38> Unique precursor in output
2024-02-15 22:21:38>   1% protein FDR: 112,094
2024-02-15 22:21:38> ================================================
2024-02-15 22:21:38> Building search statistics
2024-02-15 22:21:40> Writing stat output to disk
2024-02-15 22:21:40> Performing label free quantification
2024-02-15 22:21:41> Accumulating fragment data
2024-02-15 22:21:41> reading frag file for 20231212_OA1_MCT_SA_M768_AD02_HYE_200ng_quadPolON_sample3_01
...
2024-02-15 22:22:13> reading frag file for 20231212_OA1_MCT_SA_M768_AD02_HYE_200ng_quadPolON_sample4_01
2024-02-15 22:22:16> Performing label free quantification on the pg level
2024-02-15 22:22:16> Filtering fragments by quality
2024-02-15 22:22:16> Performing label-free quantification using directLFQ
2024-02-15 22:22:18> 10330 lfq-groups total
2024-02-15 22:22:39> using 8 processes
2024-02-15 22:22:43> lfq-object 0
2024-02-15 22:22:43> lfq-object 100
2024-02-15 22:22:43> lfq-object 200
2024-02-15 22:22:43> lfq-object 300
2024-02-15 22:22:43> lfq-object 400
2024-02-15 22:22:43> lfq-object 500
...
2024-02-15 22:24:08> lfq-object 173800
2024-02-15 22:24:08> lfq-object 173900
2024-02-15 22:24:08> lfq-object 174000
2024-02-15 22:25:17> Writing pg output to disk
2024-02-15 22:25:19> Writing psm output to disk

Correct output 0.2.14

2024-02-15 22:33:11> ================ Protein FDR =================
2024-02-15 22:33:11> Unique protein groups in output
2024-02-15 22:33:11>   1% protein FDR: 10,330
2024-02-15 22:33:11> 
2024-02-15 22:33:11> Unique precursor in output
2024-02-15 22:33:11>   1% protein FDR: 112,094
2024-02-15 22:33:11> ================================================
2024-02-15 22:33:11> Building search statistics
2024-02-15 22:33:13> Writing stat output to disk
2024-02-15 22:33:13> Performing label free quantification
2024-02-15 22:33:13> Accumulating fragment data
2024-02-15 22:33:13> reading frag file for 20231212_OA1_MCT_SA_M768_AD02_HYE_200ng_quadPolON_sample3_01
...
2024-02-15 22:33:46> reading frag file for 20231212_OA1_MCT_SA_M768_AD02_HYE_200ng_quadPolON_sample4_01
2024-02-15 22:33:48> Performing label free quantification on the pg level
2024-02-15 22:33:48> Filtering fragments by quality
2024-02-15 22:33:49> Performing label-free quantification using directLFQ
2024-02-15 22:33:51> 10330 prots total
2024-02-15 22:33:51> using 8 processes
2024-02-15 22:33:52> prot 0
2024-02-15 22:33:53> prot 1300
2024-02-15 22:33:53> prot 700
2024-02-15 22:33:53> prot 1000
2024-02-15 22:33:53> prot 1700
2024-02-15 22:33:53> prot 2300
2024-02-15 22:33:53> prot 400
...
2024-02-15 22:33:57> prot 9900
2024-02-15 22:33:57> prot 10200
2024-02-15 22:33:57> prot 10000
2024-02-15 22:33:57> prot 10300
2024-02-15 22:34:07> Writing pg output to disk
2024-02-15 22:34:08> Writing psm output to disk
ammarcsj commented 6 months ago

Hi Georg,

as we had investigated, this issue was due to not having a sorted table in your use case, which was not clearly encoded in the functions. Users just using the standard pipeline should not have encountered this. The new release is hopefully more clear on the sorting requirement. Thanks!