MannLabs / timsrust

Apache License 2.0
20 stars 9 forks source link

Exposing traits that allow mz conversion and added selective DIA reading #7

Closed jspaezp closed 1 week ago

jspaezp commented 9 months ago

This PR attempts to improve what can be achieved with the library on the DIA space.

It adds the frame msms window struct, which is meant to represent a section of frame (I envision it being used for all scans that share an isolation window but there is no reason to not use it in other cases).

It also exposes some of the converters to be used externally (RN the frame has very little value, since one cannot convert externally the tof indices to mz values, it would be nice to have access to the implementation inside timsrust, instead of having to re-implement it)

TODO:

  1. Testing, I could add a dia acquisition file we use for testing, which comes from just recording idle flow for ~5 seconds using a DIA method.
  2. Documentation, let me know if you would like more documentation comments to be added.

Questions:

I noticed that read_all_ms1_frames is defined like this:

    fn read_all_ms1_frames(&self) -> Vec<Frame> {
         (0..self.tdf_bin_reader.size())
             .into_par_iter()
             .map(|index| match self.frame_types[index] {
                FrameType::MS1 => self.read_single_frame(index),
                _ => Frame::default(),
            })
            .collect()
     }

is there any reason why the frames that are not MS1 are kept as default frames? (in contrast to first filtering the indices and returning a vec of only MS1 frames?)

LMK what you think!

sander-willems-bruker commented 8 months ago

Dear @jspaezp , I will need some time to process this PR as this is slightly bigger and some of the ideas I also started working on locally already. Apologies, but I promise I will look into this sooner than later!

sander-willems-bruker commented 8 months ago

With regards to your questions/comments:

jspaezp commented 8 months ago

Dear @jspaezp , I will need some time to process this PR as this is slightly bigger and some of the ideas I also started working on locally already. Apologies, but I promise I will look into this sooner than later!

I see! are you planning on having some form of "public roadmap" or "help wanted issues"? I would be glad to help implement some things and it would be great to be in the same page.

  • The fact that frames are kept as default (i.e. Unknown or None) in that example, means they are not being read from the binary. This speeds up things since roughly half the data is MS1 and the other half is MS2. Notice that this is implemented comparibly for read_all_ms2_frames. The option to read all frames is still possible in case you definatlye will need both. To get to your actual question: by using Empty frames there is no search needed anymore, because it is indexed directly and thus accessible in O(1). RAM concerns of empty frames is irrelevant given the amount of data per (non-empty) frame.

That makes a lot of sense! (I wonder performance-wise how it would compare with a hashmap[index -> frame], since it would not have all the empty frames).

I will work on generating the test data for DIA.

sander-willems-bruker commented 1 week ago

Took me long enough, but most should be accessible now in the dev branch. Still need to update docs and do proper error propagation, but essentials are there

sander-willems-bruker commented 1 week ago

@jspaezp . Timsrust 0.3.0 is now available.