Closed jspaezp closed 1 week ago
Dear @jspaezp , I will need some time to process this PR as this is slightly bigger and some of the ideas I also started working on locally already. Apologies, but I promise I will look into this sooner than later!
With regards to your questions/comments:
Unknown
or None
) in that example, means they are not being read from the binary. This speeds up things since roughly half the data is MS1 and the other half is MS2. Notice that this is implemented comparibly for read_all_ms2_frames
. The option to read all frames is still possible in case you definatlye will need both. To get to your actual question: by using Empty frames there is no search needed anymore, because it is indexed directly and thus accessible in O(1). RAM concerns of empty frames is irrelevant given the amount of data per (non-empty) frame.Dear @jspaezp , I will need some time to process this PR as this is slightly bigger and some of the ideas I also started working on locally already. Apologies, but I promise I will look into this sooner than later!
I see! are you planning on having some form of "public roadmap" or "help wanted issues"? I would be glad to help implement some things and it would be great to be in the same page.
- The fact that frames are kept as default (i.e.
Unknown
orNone
) in that example, means they are not being read from the binary. This speeds up things since roughly half the data is MS1 and the other half is MS2. Notice that this is implemented comparibly forread_all_ms2_frames
. The option to read all frames is still possible in case you definatlye will need both. To get to your actual question: by using Empty frames there is no search needed anymore, because it is indexed directly and thus accessible in O(1). RAM concerns of empty frames is irrelevant given the amount of data per (non-empty) frame.
That makes a lot of sense! (I wonder performance-wise how it would compare with a hashmap[index -> frame], since it would not have all the empty frames).
I will work on generating the test data for DIA.
Took me long enough, but most should be accessible now in the dev branch. Still need to update docs and do proper error propagation, but essentials are there
@jspaezp . Timsrust 0.3.0 is now available.
FrameReader
should give you all info you need. Note that it now has a rather convenient parallel_filter
function (https://github.com/MannLabs/timsrust/blob/main/src/io/readers/frame_reader.rs#L77), that should provide very efficient access to frames without much wasted CPU/RAM usage.Unknown
frames coming of the parallel filter anymore, so reindexing based on frame.index
might be needed.MetadataReader
should give quick access to all converters.
This PR attempts to improve what can be achieved with the library on the DIA space.
It adds the frame msms window struct, which is meant to represent a section of frame (I envision it being used for all scans that share an isolation window but there is no reason to not use it in other cases).
It also exposes some of the converters to be used externally (RN the frame has very little value, since one cannot convert externally the tof indices to mz values, it would be nice to have access to the implementation inside timsrust, instead of having to re-implement it)
TODO:
Questions:
I noticed that
read_all_ms1_frames
is defined like this:is there any reason why the frames that are not MS1 are kept as default frames? (in contrast to first filtering the indices and returning a vec of only MS1 frames?)
LMK what you think!