Open ypriverol opened 6 years ago
Yeah, that would be nice and give it a proper name :)
I like 'parquet', as it is pretty clear what library to use to open it.
Regarding column names. I had a few thoughts:
Mass
or Masses
is technically wrong as it is M/Z values. Or do you convert the M/Z values into masses internally? Intensities
could be Intensity
even if it is an array. RetentionTime
was used in mzXML files, in mzML files I have seen it as ScanTime
which is a bit more general and may be more accurate. It would not imply that a chromatographic step was used.Things like TIC
are maybe convenient, but also somewhat redundant and it could be calculated easily in one line of code if the data would be in long format.
df_long.groupby('scan_time_min').sum().plot(y='intensity')
I am quite new to metabolomics/proteomics thought. I am looking at the problem more from a data science Python-biased perspective.
We need to do some standardization for the Parquet format that enables other people to understand the file format.