Open dominik-kopczynski opened 4 years ago
it should be easy and is the original idea of the library, we have the parquet file export.
Hi @dominik-kopczynski, I think our good friend David have some plans on mzDB and ThermoRawFileParser.
It should be indeed possible to implement the whole conversion logic inside the ThermoRawFileParser library. However I decided to test a different solution for the mzDB conversion implementation, which should allow me to reuse existing Java/Scala code. If we are happy about this experiment it should also help other folks working in C++, R and so, to use the ThermoRawFileParser library. It might be useful for data visualization on Linux for instance.
I have forked the current project and performed some changes enabling the embedding: https://github.com/david-bouyssie/ThermoRawFileParser/commit/bf1e6f33901fdb3a86c00447285051f2cc76685c
In parallel I have forked the Embeddinator-4000 project and created some Windows Docker files to simplify the build of the fork from sources: https://github.com/david-bouyssie/e4k-dockers
I'm using Embeddinator-4000 to generate the glue code (a C-like library wrapping the C# one and a JAR file containing the wrapper). Now I'm working on the integration in the mzdb4s project: https://github.com/mzdb/mzdb4s I already have a prototype which is working on Windows. The next step is to make it work on Linux, but it should not be a bigger problem.
Feedback is welcome ;)
Here is a first pre-release including two converters (raw->mzDB and mzDB->MGF): https://github.com/mzdb/mzdb4s/releases/download/0.2/mzdb-conversion-tools_0.2.zip
Note that the thermo2mzDB is a native executable which is targeting Linux Ubuntu. I could also deliver a Java program if needed.
Hi folks,
would it be possible to add a further output format for the thermo raw file parser, namely the mzDB file format [1]? With its index strategy and database structure, it is way faster to read them than parsing mzML files.
Cheers, Dominik
[1] https://pubmed.ncbi.nlm.nih.gov/25505153/