bigbio / pmultiqc

A library for QC report based on MultiQC framework
GNU General Public License v3.0
14 stars 9 forks source link

Move parsing of mzML to a step in the workflow #63

Closed jpfeuffer closed 1 year ago

jpfeuffer commented 2 years ago

It just takes too long during multiqc execution. Could be done in the mzML indexing step. You can write out a table:

spec_id, peaks, base int, ...

jpfeuffer commented 2 years ago

The step can use the pyopenms container/conda

WangHong007 commented 2 years ago

An example is here: mzml_dataframe_file_part I extracted information about MS1, MS2, MS3 (if exist) from mzMLs, where MS1 and MS3 only have information about whether they exist or not. What is done is to collect all mzMLs and extract information to make a dataframe, without statistics. The final csv file is passed to pmultiqc for processing.

jpfeuffer commented 2 years ago

Wouldn't it be smarter to have a column MSlevel that then has the values 1,2,3,... ?