Closed tommasomari closed 2 years ago
Hi,
Thanks for reporting. Indeed the array shapes to not match as we have multiple fragments for each scan. We use the indices to keep track of what belongs to what.
Often, we then use np.searchsorted
to create a lookup index in the same size, like so:
np.searchsorted(indices_, np.arange(len(mass_data)), side='right') - 1
.
Below is a code snipped to get from ms_data to a dataframe with scans.
One could potentially have this as a method for the ms_data.hdf
to directly read as dataframe, e.g. .read_DDA_ms1_df()
. Would this be what you are looking for?
Thanks for the quick reply. I hadn't realised the indices were structured this way, now the structure of the hdf is more clear.
Describe the bug Looking at the structure of an .ms_data.hdf file, it seems the information on which peaks in MS1_scans or MS2_scans belong to which scan is lost.
To Reproduce Example from my analysis
Output:
And similarly for MS2_scans:
Output:
Having int_list_ms and mass_list_ms as simple arrays of mzs and intensity values does not allow to recognise in which scan they belong.
Expected behavior int_list_ms and mass_list_ms could be reported as arrays of arrays, one for each scan.
Screenshots
Example of output when trying to build a dataframe from the HDF5 group to assign the peaks to each scan.
Version (please complete the following information):