Closed singjc closed 2 years ago
@jcharkow @alhigaylan can you guys try and test this out
I fixed version dependencies for me @singjc please let me know if this messes anything up on your end.
@jcharkow Great thanks for the fix! Seems to still work on my end in a new environment (python 3.9). I do run into issues using a clean environment using python 3.10, specifically with pyopenms installation version issues though. Which is a separate issue.
Thanks for the singularity definition!
I added the option of processing caches mzMLs that uses SpectrumAccessOpenMSCached OSSpectrum, but it actually seems slower (100 fold) than the OnDiskExperiment MSSpectrum. :thinking:
I compared the OnDiskExperiment to the SpectrumAccessOpenMSCached on only MS1 level data, MS1 filtered mzML for the former and an MS1 cached mzML for the latter. I tested using 54 precursors, and extract 1208 spectra. Each point in the plot represents the time in milliseconds (measured by pythons time.time()
) it takes to extract the m/z, intensity and ion mobility arrays from the corresponding Spectrum object for a single spectrum.
However, the OnDiskExperiment does take longer to load the data and meta-data initially, 24.7840 sec, while the SpectrumAccessOpenMSCached only takes 0.2491 sec to load the data and meta-data.
Overall, they are pretty close in execution time for reducing the spectra for 54 precursors, 92.5028 sec for OnDisk and 102.7164 sec for Cache.
Cached spectra processing is now 10 fold faster than ondisk, using hroest/OpenMS/tree/feature/drift_time_os_spec_2
It also now only takes 21.0601 sec to reduce the spectra from cache, vs 74.5139 sec to reduce the spectra from ondisk
diapysef targeted-extraction --in /media/justincsing/ExtraDrive1/Documents2/Roest_Lab/Github/PTMs_Project/synthetic_pool_timstoff/data/raw/IPP_M10_DIA-PaSEF_60min_Bruker10_400nL_1ul-inj-redo2_Slot2-25_1_2151_MS1.mzML --coords peptides.pkl --readOptions ondisk --verbose 1 --mslevel [1] --mz_tol 20 --rt_window 40 --im_window 0.08 Found Bruker sdk. Access to the raw data is possible.
[2022-09-30 13:18:20] INFO: Loading data... [2022-09-30 13:20:34] INFO: Reducing spectra using targeted coordinates... INFO: Processing..YVC(UniMod:4)EGPSHGGLPGAS(UniMod:21)SEK_3: 100%|███████████████████████████████████████████████████████████| 54/54 [01:14<00:00, 1.38s/it] [2022-09-30 13:21:48] INFO: Finished extracting targeted spectra!
diapysef targeted-extraction --in /media/justincsing/ExtraDrive1/Documents2/Roest_Lab/Github/PTMs_Project/synthetic_pool_timstoff/data/raw/cached/20220928_171403_179508ef404e_1_1_ms1.mzML --coords peptides.pkl --readOptions cached --verbose 1 --mslevel [1] --mz_tol 20 --rt_window 40 --im_window 0.08 Found Bruker sdk. Access to the raw data is possible.
[2022-09-30 13:41:11] INFO: Loading data... [2022-09-30 13:41:11] INFO: Reducing spectra using targeted coordinates... INFO: Processing..YVC(UniMod:4)EGPSHGGLPGAS(UniMod:21)SEK_3: 100%|███████████████████████████| 54/54 [00:21<00:00, 2.57it/s] [2022-09-30 13:41:32] INFO: Finished extracting targeted spectra!
@jcharkow I tested on windows, and everything works. Can we merge before changes get too large?
It looks like some of my comments may have included a bigger code block than intended but if you have any questions let me know
Great thanks for the comments, I addressed the comments and made the changes.
Looks good I think its ready to merge!
Changes
Example
Conversion of TDF to mzML
Targeted Data Extraction
Generating peptide coordinates for targeted raw data extraction
Targeted Extraction of the Raw diaPASEF mzML data
Exporting reduced targeted mzML for easier data manipulation and plotting
Generating a report of RT and IM Heatmap plots
Docker image available
I have a docker image available on docker hub singjust/modibik
Using the image
Singularity
Some issues I ran into using pyopenms during implementation
Error pickling MSSpectrum
Error setting Floating Data Arrays