bigbio / quantms

Quantitative mass spectrometry workflow. Currently supports proteomics experiments with complex experimental designs for DDA-LFQ, DDA-Isobaric and DIA-LFQ quantification.
https://quantms.org
MIT License
34 stars 35 forks source link

ProteomicsLFQ memory consumed #432

Open daichengxin opened 2 weeks ago

daichengxin commented 2 weeks ago

Description of the bug

The errors are reported when i ran PXD001819 LFQ datasets (about 10G mzML files). It looks like it's running out of memory? But the available memory is 120G. So I'm not sure if this is normal or not.

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

Command used and terminal output

ProteomicsLFQ       -threads 2       -in UPS1_12500amol_R1.mzML UPS1_12500amol_R2.mzML UPS1_12500amol_R3.mzML UPS1_125amol_R1.mzML UPS1_125amol_R2.mzML UPS1_125amol_R3.mzML UPS1_25000amol_R1.mzML UPS1_25000amol_R2.mzML UPS1_25000amol_R3.mzML UPS1_2500amol_R1.mzML UPS1_2500amol_R2.mzML UPS1_2500amol_R3.mzML UPS1_250amol_R1.mzML UPS1_250amol_R2.mzML UPS1_250amol_R3.mzML UPS1_50000amol_R1.mzML UPS1_50000amol_R2.mzML UPS1_50000amol_R3.mzML UPS1_5000amol_R1.mzML UPS1_5000amol_R2.mzML UPS1_5000amol_R3.mzML UPS1_500amol_R1.mzML UPS1_500amol_R2.mzML UPS1_500amol_R3.mzML UPS1_50amol_R1.mzML UPS1_50amol_R2.mzML UPS1_50amol_R3.mzML       -ids UPS1_12500amol_R1_comet_feat_perc_pep_filter.idXML UPS1_12500amol_R2_comet_feat_perc_pep_filter.idXML UPS1_12500amol_R3_comet_feat_perc_pep_filter.idXML UPS1_125amol_R1_comet_feat_perc_pep_filter.idXML UPS1_125amol_R2_comet_feat_perc_pep_filter.idXML UPS1_125amol_R3_comet_feat_perc_pep_filter.idXML UPS1_25000amol_R1_comet_feat_perc_pep_filter.idXML UPS1_25000amol_R2_comet_feat_perc_pep_filter.idXML UPS1_25000amol_R3_comet_feat_perc_pep_filter.idXML UPS1_2500amol_R1_comet_feat_perc_pep_filter.idXML UPS1_2500amol_R2_comet_feat_perc_pep_filter.idXML UPS1_2500amol_R3_comet_feat_perc_pep_filter.idXML UPS1_250amol_R1_comet_feat_perc_pep_filter.idXML UPS1_250amol_R2_comet_feat_perc_pep_filter.idXML UPS1_250amol_R3_comet_feat_perc_pep_filter.idXML UPS1_50000amol_R1_comet_feat_perc_pep_filter.idXML UPS1_50000amol_R2_comet_feat_perc_pep_filter.idXML UPS1_50000amol_R3_comet_feat_perc_pep_filter.idXML UPS1_5000amol_R1_comet_feat_perc_pep_filter.idXML UPS1_5000amol_R2_comet_feat_perc_pep_filter.idXML UPS1_5000amol_R3_comet_feat_perc_pep_filter.idXML UPS1_500amol_R1_comet_feat_perc_pep_filter.idXML   UPS1_500amol_R2_comet_feat_perc_pep_filter.idXML UPS1_500amol_R3_comet_feat_perc_pep_filter.idXML UPS1_50amol_R1_comet_feat_perc_pep_filter.idXML UPS1_50amol_R2_comet_feat_perc_pep_filter.idXML UPS1_50amol_R3_comet_feat_perc_pep_filter.idXML       -design PXD001819.sdrf_openms_design.tsv       -fasta uniprot_yeast_ups_decoy.fasta       -protein_inference aggregation       -quantification_method feature_intensity       -targeted_only false       -feature_with_id_min_score 0.10       -feature_without_id_min_score 0.75       -mass_recalibration false       -Seeding:intThreshold 1000       -protein_quantification unique_peptides       -alignment_order star              -psmFDR 0.01       -proteinFDR 0.01       -picked_proteinFDR true       -out_cxml PXD001819.sdrf_openms_design_openms.consensusXML       -out PXD001819.sdrf_openms_design_openms.mzTab       -out_msstats PXD001819.sdrf_openms_design_msstats_in.csv    -PeptideQuantification:extract:batch_size 1000          -debug 0       2>&1 | tee proteomicslfq.log

Relevant files

log file: proteomicslfq.log

System information

quantms 1.3.0

jpfeuffer commented 2 weeks ago

I have analysed this dataset many times. Never had issues.

jpfeuffer commented 2 weeks ago

You can trace the memory consumption while it's running.

ypriverol commented 2 weeks ago

Me too, but the last release of quantms 1.3.0 uses OpenMS 3.2.0. We have other ongoing problems with this version and mzTab export in ProteinQuantifier. @timosachsenberg Can you help us here?

timosachsenberg commented 2 weeks ago

log does not indicate that it is the export. is there a way we can find out where/when this regression was introduced?

daichengxin commented 2 weeks ago

I retried openms 3.2.0 and traced the memory. It does exceed the memory. Why is it out of memory? mzML only has 10G. I haven't encountered this before either

timosachsenberg commented 2 weeks ago

Thanks for checking. This it is really suspicious. Can you reproduce this e.g., for one or two files?

daichengxin commented 2 weeks ago

I can reproduce this in two files. But a single file are work. Test files: https://www.dropbox.com/scl/fi/jgbw0pvnm18cga1kwgy54/proteomicslfq.zip?rlkey=6igoyec9ffztk9p8f4uriukct&st=osldx9cp&dl=0

timosachsenberg commented 2 weeks ago

I can confirm that it uses 400gb for two small files during feature extraction. My first guess would be that something inside e.g. the OpenSWATH code might have changed.

ypriverol commented 1 week ago

These are the two files in PRIDE:

https://ftp.pride.ebi.ac.uk/pride/data/archive/2015/12/PXD001819/UPS1_125amol_R1.raw https://ftp.pride.ebi.ac.uk/pride/data/archive/2015/12/PXD001819/UPS1_125amol_R2.raw

timosachsenberg commented 1 week ago

Likely related to a different conversion using TRFP. The file works with ProteoWizard msconvert