compomics / ThermoRawFileParser

Thermo RAW file parser that runs on Linux/Mac and all other platforms that support Mono
Apache License 2.0
181 stars 47 forks source link

mzDB output #83

Open dominik-kopczynski opened 4 years ago

dominik-kopczynski commented 4 years ago

Hi folks,

would it be possible to add a further output format for the thermo raw file parser, namely the mzDB file format [1]? With its index strategy and database structure, it is way faster to read them than parsing mzML files.

Cheers, Dominik

[1] https://pubmed.ncbi.nlm.nih.gov/25505153/

ypriverol commented 4 years ago

it should be easy and is the original idea of the library, we have the parquet file export.

caetera commented 4 years ago

Hi @dominik-kopczynski, I think our good friend David have some plans on mzDB and ThermoRawFileParser.

david-bouyssie commented 4 years ago

It should be indeed possible to implement the whole conversion logic inside the ThermoRawFileParser library. However I decided to test a different solution for the mzDB conversion implementation, which should allow me to reuse existing Java/Scala code. If we are happy about this experiment it should also help other folks working in C++, R and so, to use the ThermoRawFileParser library. It might be useful for data visualization on Linux for instance.

I have forked the current project and performed some changes enabling the embedding: https://github.com/david-bouyssie/ThermoRawFileParser/commit/bf1e6f33901fdb3a86c00447285051f2cc76685c

In parallel I have forked the Embeddinator-4000 project and created some Windows Docker files to simplify the build of the fork from sources: https://github.com/david-bouyssie/e4k-dockers

I'm using Embeddinator-4000 to generate the glue code (a C-like library wrapping the C# one and a JAR file containing the wrapper). Now I'm working on the integration in the mzdb4s project: https://github.com/mzdb/mzdb4s I already have a prototype which is working on Windows. The next step is to make it work on Linux, but it should not be a bigger problem.

Feedback is welcome ;)

david-bouyssie commented 4 years ago

Here is a first pre-release including two converters (raw->mzDB and mzDB->MGF): https://github.com/mzdb/mzdb4s/releases/download/0.2/mzdb-conversion-tools_0.2.zip

Note that the thermo2mzDB is a native executable which is targeting Linux Ubuntu. I could also deliver a Java program if needed.