compomics / ThermoRawFileParser

Thermo RAW file parser that runs on Linux/Mac and all other platforms that support Mono
Apache License 2.0
182 stars 48 forks source link

Conversion to support complex filter expressions? #142

Open keesh-elucid opened 2 years ago

keesh-elucid commented 2 years ago

Hello, We are evaluating using ThermoRawFileParser to replace an olde XRawfile2_x64.dll based C++ program. I was wondering if you had any plans to support msconvert filter expressions like the following during RAW to mzML, MGF conversions? --filter "chargeStatePredictor maxMultipleCharge=4 minMultipleCharge=2 singleChargeFractionTIC=0.9" --filter "threshold absolute 0.00000000001 most-intense" ...--filter "titleMaker test1 scan (Exploris1_PooledLambda_20220101_AmyloidVer5X_yyy9....dta)" thx keesh@ieee.org

caetera commented 2 years ago

Hi @keesh-elucid,

thank you for your interest in TRFP. The honest (and unfortunate) answer is that there are certainly no plans to implement all MSConvert filters. While some simple filters, such as intensity threshold or scan range might be implemented in the future (no solid timepoint though), there is certainly not enough capacity to implement chargeStatePredictor, non-vendor peakPicking or similar. Any help is appreciated. As a workaround on non-windows platforms it is possible to use TRFP to convert from vendor to mzML and then use filters from MSConvert. On Windows MSConvert is capable to convert vendor files directly (MSConvert and TRFP use the same libraries to process Thermo RAW files).

keesh0 commented 2 years ago

Thanks, your work-around seemed fine. I am still a newbie w.r.t the ThermoFisher.CommonCore assemblies. Regarding filtering, I think I saw some support in ThermoFisher.CommonCore.Data.Business? Do you happen to have a good link to the ThermoFisher.CommonCore API?

caetera commented 2 years ago

To the best of my knowledge, some filtering is supported out of the box, for example, slicing by scan or time (--filter "scanNumber xx-xx", --filter "scanTime xx-xx" in msconvert notation), filtering by filter string, similar to what can be done through filters in FreeStyle or QualBrowser (for example, filter d cv=-40 will deliver only dependent scans with FAIMS CV -40V; you can check help pages FreeStyle or QualBrowser for details), averaging and subtracting of scans is supported as well (the support from TRFP is in the to-do list #135), slicing individual scans (i.e. get only peaks in m/z range (x_1, x_2) is also supported.

The documentation and examples that I am aware of (as well as the assemblies itself) can be obtained as described in this presentation https://www.analyteguru.com/t5/Scientific-Library/Raw-File-Reader/ta-p/8870; essentially you will need to write to Jim Shofstahl to get access to shared folder in OneDrive (that is the most "official" way).