dp-mzml is a java library for quickly parsing spectrum data from an mzML file via sequential iteration or random access.
Features:
Easily load your own Spectrum instances, and parse what you need.
To parse an mzML file, construct an MzMLStAXParser
Spectrum and SpectrumHeader are provided by default in the com.digitalproteomics.parsers.mzml.model
package, and corresponding builders for Spectrum and SpectrumHeader are found in com.digitalproteomics.parsers.mzml.builders.
You can implement your own handlers by implementing MzMLStAXParser.FromXMLStreamBuilder
Running a single thread on Ubuntu 16.04 system with an Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz and 32 GB of RAM.
Datasets tested on indexedMzML files generated by msconvert:
The benchmarks were ran against jmzml 1.7.8, and pymzml 0.7.8 on the same desktop.
Code | UPS1_50000amol_R3.mzML time (Minutes:Seconds) | UPS1_50000amol_R3.mzML max memory (Kb) | small.pwiz.1.1.mzML time (Minutes:Seconds) | |
---|---|---|---|---|
mzmlparser (SpectrumHeader only) | java | 0:04.64, 0:04.68, 0:04.59, 0:04.59, 0:04.76 | 266784, 249424, 239664, 267432, 254616 | 0:00.47, 0:00.44, 0:00.44, 0:00.54, 0:00.52 |
mzmlparser (Spectrum) | java | 0:17.81, 0:17.82, 0:18.39, 0:17.99, 0:17.77 | 3535408, 3553576, 3544656, 3533604, 3535884 | 0:00.57, 0:00.51, 0:00.58, 0:00.59, 0:00.55 |
pymzml | python | 0:35.68, 0:35.53, 0:35.83, 0:35.64, 0:35.09 | 146060, 146612, 146192, 145952, 145916 | 0:01.15, 0:01.05, 0:00.88, 0:01.04, 0:01.04 |
jmzml | java | 1:45.10, 1:43.87, 1:45.38, 1:44.03, 1:41.48 | 5851512, 6095684, 5644776, 5635264, 5836852 | 0:01.44, 0:01.39, 0:01.48, 0:01.63, 0:01.54 |
Code | UPS1_50000amol_R3.mzML time (Minutes:Seconds) of 10000 random indices | UPS1_50000amol_R3.mzML max memory (Kb) of 10000 random indices | UPS1_50000amol_R3.mzML time (Minutes:Seconds) of 5000 random indices | |
---|---|---|---|---|
mzmlparser (SpectrumHeader only) | java | 0:02.35, 0:02.57, 0:02.31, 0:02.34, 0:02.48 | 392392, 385660, 383504, 388208, 386984 | 0:01.66, 0:01.57, 0:01.51, 0:01.58, 0:01.57 |
mzmlparser (Spectrum) | java | 0:05.50, 0:05.68, 0:05.46, 0:05.37, 0:05.33 | 1379292, 1381508, 1388480, 1389672, 1386696 | 0:03.20, 0:03.02, 0:03.16, 0:03.09, 0:03.19 |
pymzml | python | 0:06.60, 0:06.73, 0:06.57, 0:06.69, 0:06.67 | 99792, 99480, 99416, 99392, 99740 | 0:04.01, 0:03.90, 0:03.82, 0:03.96, 0:03.93 |
jmzml | java | 1:22.41, 1:23.47, 1:23.18, 1:21.68, 1:23.05 | 3032636, 3029300, 3029692, 3028772, 3032184 | 1:18.57, 1:19.36, 1:16.57, 1:18.97, 1:23.21 |
Note, these will be installed automatically if you are using Maven.
To use this library, add the following to your pom.xml file.
<dependency>
<groupId>com.digitalproteomics</groupId>
<artifactId>dp-mzml</artifactId>
<version>1.1.0-RELEASE</version>
</dependency>
Apache License 2.0
Copyright 2019 Digital Proteomics, LLC