Nesvilab / DIA-Umpire

Computational analysis for mass spectrometry-based proteomics data
https://diaumpire.nesvilab.org/
GNU General Public License v3.0
18 stars 5 forks source link

DIA-umpire crashes when processing a mzML file (during serialization) #2

Closed sampie closed 1 year ago

sampie commented 2 years ago

Hi

I have a spectrum file (SWATH-DCIS-2-1-SWATH-CCIS2-2.wiff) from public dataset (http://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=PXD014194), the raw file has been converted to mzXML.

Running pipeline command: java -Xms224g -Xmx224g -jar /opt/dia-umpire/DIA_Umpire_SE.jar /run-files/pxd014194-pseudo-spectra/libfree/SWATH-DCIS-2-1-SWATH-CCIS2-2.mzXML /run-files/pxd014194-pseudo-spectra/diaumpire-params.txt

Params file: diaumpire-params.txt

The messages from diaumpire:

2022-04-27 23:22:13,882 INFO  [root] Processing /run-files/pxd014194-pseudo-spectra/libfree/SWATH-DCIS-2-1-SWATH-CCIS2-2.mzXML....
2022-04-27 23:22:13,897 INFO  [root] Writing DIA setting to file:/run-files/pxd014194-pseudo-spectra/libfree/SWATH-DCIS-2-1-SWATH-CCIS2-2_diasetting.ser...
2022-04-27 23:22:14,277 INFO  [root] Writing parameter to file:/run-files/pxd014194-pseudo-spectra/libfree/SWATH-DCIS-2-1-SWATH-CCIS2-2_params.ser...
2022-04-27 23:22:14,282 INFO  [root] Module A: Signal extraction
2022-04-27 23:25:01,090 INFO  [root] Writing DIA setting to file:/run-files/pxd014194-pseudo-spectra/libfree/SWATH-DCIS-2-1-SWATH-CCIS2-2_diasetting.ser...
2022-04-27 23:25:02,008 INFO  [root] Processing MS1 peak detection
2022-04-27 23:25:02,008 INFO  [root] MS1 average cycle time : 2.1650283 seconds
2022-04-27 23:25:59,802 INFO  [root] Processing all scans to detect possible m/z peak curves and
2022-04-27 23:25:59,802 INFO  [root] Smoothing detected signals......
2022-04-27 23:29:54,080 INFO  [root] 10901415 Peak curves found (Memory usage:12551MB)
2022-04-27 23:29:54,081 INFO  [root] Inclusion mz values found: 0/0
2022-04-27 23:29:54,081 INFO  [root] Grouping isotopic peak curves........
2022-04-27 23:29:54,083 INFO  [root] Building PeakCurve Mass-RT KD tree
2022-04-27 23:53:45,455 INFO  [root] No of ion clusters:4129679 (Memory usage:14073MB)
2022-04-27 23:53:46,959 INFO  [root] Writing PeakCluster serialization to file:SWATH-DCIS-2-1-SWATH-CCIS2-2_PeakCluster.serFS...
2022-04-27 23:55:03,179 ERROR [root] java.lang.NegativeArraySizeException
        at java.util.Arrays.copyOf(Arrays.java:3236)
        at org.nustaq.serialization.util.FSTOutputStream.grow(FSTOutputStream.java:92)
        at org.nustaq.serialization.util.FSTOutputStream.ensureFree(FSTOutputStream.java:71)
        at org.nustaq.serialization.coders.FSTStreamEncoder.writePlainInt(FSTStreamEncoder.java:488)
        at org.nustaq.serialization.coders.FSTStreamEncoder.writeFFloat(FSTStreamEncoder.java:340)
        at org.nustaq.serialization.FSTObjectOutput.writeObjectFields(FSTObjectOutput.java:613)
        at org.nustaq.serialization.FSTObjectOutput.defaultWriteObject(FSTObjectOutput.java:523)
        at org.nustaq.serialization.FSTObjectOutput.writeObjectWithContext(FSTObjectOutput.java:442)
        at org.nustaq.serialization.FSTObjectOutput.writeObjectInternal(FSTObjectOutput.java:317)
        at org.nustaq.serialization.serializers.FSTArrayListSerializer.writeObject(FSTArrayListSerializer.java:49)
        at org.nustaq.serialization.FSTObjectOutput.writeObjectWithContext(FSTObjectOutput.java:452)
        at org.nustaq.serialization.FSTObjectOutput.writeObjectInternal(FSTObjectOutput.java:317)
        at org.nustaq.serialization.FSTObjectOutput.writeObject(FSTObjectOutput.java:282)
        at org.nustaq.serialization.FSTObjectOutput.writeObject(FSTObjectOutput.java:191)
        at MSUmpire.LCMSPeakStructure.LCMSPeakBase.FS_PeakClusterWrite(LCMSPeakBase.java:313)
        at MSUmpire.LCMSPeakStructure.LCMSPeakBase.WritePeakClusterSerialization(LCMSPeakBase.java:305)
        at MSUmpire.LCMSPeakStructure.LCMSPeakBase.ExportPeakCluster(LCMSPeakBase.java:296)
        at MSUmpire.LCMSPeakStructure.LCMSPeakMS1.PeakClusterDetection(LCMSPeakMS1.java:230)
        at MSUmpire.DIA.DIAPack.MS1PeakDetection(DIAPack.java:926)
        at MSUmpire.DIA.DIAPack.process(DIAPack.java:176)
        at DIA_Umpire_SE.DIA_Umpire_SE.main(DIA_Umpire_SE.java:366)
guoci commented 2 years ago

can you send the mzXML file to me?

sampie commented 2 years ago

Sure. Here is the file: https://bioinfoshare.utu.fi/DataTransfer/pxd014194/SWATH-DCIS-2-1-SWATH-CCIS2-2.mzXML

guoci commented 2 years ago

@sampie Did it run to the end? If yes, then that is fine.

sampie commented 2 years ago

@guoci I don't think it did. I can see SWATH-DCIS-2-1-SWATH-CCIS2-2_Q1.mgf.temp (and also Q2 and Q3 temp files), but not the actual mgf files. I guess it crashed before it had a change to write the mgf files.

guoci commented 2 years ago

@sampie can you post the full log file?

sampie commented 2 years ago

@guoci Sure. Here is the full output given by diaumpire.

diaumpire-log.txt

guoci commented 2 years ago

Looks like it was killed externally, in the log there was no trace of why it crashed. Can you use the latest version and also rerun it with more memory?

sampie commented 2 years ago

So, the crash itself is not related to the exception messages, but the memory is running out. Makes sense, now from the system logs I could see that oom-killer has become active when the analysis was terminated.

The machine has 228G ram. There is no easy way to increase it, but I'll try to reduce threads from 16 -> 4 in the hope that it would reduce memory consumption.

sampie commented 2 years ago

It seems that reducing thread count down to 4 did not help, it still took all the memory.

@guoci What is the expected memory consumption? I wonder why it is taking over 200GB of ram when the mzXML is only less than 7GB?

(I did also try the latest diaumpire, but it gave java.lang.ClassNotFoundException. It looks like v2.1.3 is provided in zip package that includes other jar files, but later versions have only a single diaumpire jar.)

guoci commented 2 years ago

@sampie can you try v2.2.8? Also, check your java version is >= 11.

sampie commented 2 years ago

@guoci It seems v.2.2.8 starts properly. However, it does also exhausts all the memory ("No of ion clusters:268503 (Memory usage:23779MB)").

Are there any configuration parameters that would reduce memory usage? I will try if reducing threads to 1 would help and maybe I also try a bit smaller amount of memory to Xms and Xmx parameters. In case that would limit the memory consumption.

Output before memory was exhausted: `2022-06-09 08:01:01,219 INFO [main] Writing PeakCluster serialization to file:SWATH-DCIS-2-1-SWATH-CCIS2-2_PeakCluster.ser...

2022-06-09 08:04:47,575 INFO [main] ================================================================================== 2022-06-09 08:04:47,576 INFO [main] Processing DIA MS2 (mz range):1123.7_1250.0( 1/40 ) 2022-06-09 08:04:54,433 INFO [main] Processing all scans to detect possible m/z peak curves and 2022-06-09 08:04:54,434 INFO [main] Smoothing detected signals...... 2022-06-09 09:04:33,732 INFO [main] 8044738 Peak curves found (Memory usage:22421MB) 2022-06-09 09:04:33,736 INFO [main] Grouping isotopic peak curves........ 2022-06-09 09:04:33,736 INFO [main] Building PeakCurve Mass-RT KD tree 2022-06-09 09:10:16,774 INFO [main] No of ion clusters:268503 (Memory usage:23779MB) 2022-06-09 09:10:16,775 INFO [main] Writing PeakCluster serialization to file:SWATH-DCIS-2-1-SWATH-CCIS2-2_1123_1250_PeakCluster.serFS... 2022-06-09 09:10:20,848 INFO [main] Performing mass defect filter on fragment peaks 2022-06-09 09:10:20,848 INFO [main] No. of fragment peaks: 9430859 2022-06-09 09:10:22,017 INFO [main] No. of remaining fragment peaks: 7029918 2022-06-09 09:10:22,018 INFO [main] Building precursor-fragment pairs for MS1 features....`

guoci commented 2 years ago

@sampie I am not sure what is causing it. May I suggest that you use FragPipe? From there you can select specific workflows that will preset the parameters for DIA Umpire.

sampie commented 2 years ago

@guoci It looks like the run completes successfully if there is more memory. The run took 361GB RAM. The resulting mgfs are 12GB, 30GB, and 26GB.