abterrabio / dp-mzml

A java-based streaming parser for mzml files
Apache License 2.0
3 stars 1 forks source link

A parse what you need mzML parser library

Build Status

dp-mzml is a java library for quickly parsing spectrum data from an mzML file via sequential iteration or random access.

Features:

Benchmarks

Running a single thread on Ubuntu 16.04 system with an Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz and 32 GB of RAM.

Datasets tested on indexedMzML files generated by msconvert:

The benchmarks were ran against jmzml 1.7.8, and pymzml 0.7.8 on the same desktop.

Sequential file parsing

Code UPS1_50000amol_R3.mzML time (Minutes:Seconds) UPS1_50000amol_R3.mzML max memory (Kb) small.pwiz.1.1.mzML time (Minutes:Seconds)
mzmlparser (SpectrumHeader only) java 0:04.64, 0:04.68, 0:04.59, 0:04.59, 0:04.76 266784, 249424, 239664, 267432, 254616 0:00.47, 0:00.44, 0:00.44, 0:00.54, 0:00.52
mzmlparser (Spectrum) java 0:17.81, 0:17.82, 0:18.39, 0:17.99, 0:17.77 3535408, 3553576, 3544656, 3533604, 3535884 0:00.57, 0:00.51, 0:00.58, 0:00.59, 0:00.55
pymzml python 0:35.68, 0:35.53, 0:35.83, 0:35.64, 0:35.09 146060, 146612, 146192, 145952, 145916 0:01.15, 0:01.05, 0:00.88, 0:01.04, 0:01.04
jmzml java 1:45.10, 1:43.87, 1:45.38, 1:44.03, 1:41.48 5851512, 6095684, 5644776, 5635264, 5836852 0:01.44, 0:01.39, 0:01.48, 0:01.63, 0:01.54

Random access file parsing with indexed mzML

Code UPS1_50000amol_R3.mzML time (Minutes:Seconds) of 10000 random indices UPS1_50000amol_R3.mzML max memory (Kb) of 10000 random indices UPS1_50000amol_R3.mzML time (Minutes:Seconds) of 5000 random indices
mzmlparser (SpectrumHeader only) java 0:02.35, 0:02.57, 0:02.31, 0:02.34, 0:02.48 392392, 385660, 383504, 388208, 386984 0:01.66, 0:01.57, 0:01.51, 0:01.58, 0:01.57
mzmlparser (Spectrum) java 0:05.50, 0:05.68, 0:05.46, 0:05.37, 0:05.33 1379292, 1381508, 1388480, 1389672, 1386696 0:03.20, 0:03.02, 0:03.16, 0:03.09, 0:03.19
pymzml python 0:06.60, 0:06.73, 0:06.57, 0:06.69, 0:06.67 99792, 99480, 99416, 99392, 99740 0:04.01, 0:03.90, 0:03.82, 0:03.96, 0:03.93
jmzml java 1:22.41, 1:23.47, 1:23.18, 1:21.68, 1:23.05 3032636, 3029300, 3029692, 3028772, 3032184 1:18.57, 1:19.36, 1:16.57, 1:18.97, 1:23.21

Requirements

Note, these will be installed automatically if you are using Maven.

Install with Maven

To use this library, add the following to your pom.xml file.

<dependency>
  <groupId>com.digitalproteomics</groupId>
  <artifactId>dp-mzml</artifactId>
  <version>1.1.0-RELEASE</version>
</dependency>

License

Apache License 2.0

Copyright 2019 Digital Proteomics, LLC