MannLabs / alphatims

An open-source Python package for efficient accession and visualization of Bruker TimsTOF raw data from the Mann Labs at the Max Planck Institute of Biochemistry.
https://doi.org/10.1016/j.mcpro.2021.100149
Apache License 2.0
81 stars 26 forks source link

[Question] Centroiding of TimsTOF class objects #276

Closed triston-groff closed 1 year ago

triston-groff commented 1 year ago

Very useful software you have created! For the most part I have become proficient at using alphatims to accomplish various tasks, but I have been struggling to figure out how I can centroid my data.

There is the function alphatims.bruker.centroid_spectra(), but from what I can tell the purpose of this is to centroids fragment ions. I could be wrong about this, it isn't very clear how to properly use it. A required argument is a spectrum_indptr array (which can be obtained using alphatims.bruker.TimsTOF.index_precursors() ), but it is unclear exactly what this array is describing. It is also not very clear what the difference between an index and an index_pntr is. Also spectrum_counts array is required, but I do not know how to obtain this.

My goal is to perform centroiding on all MS1 and MS2 level data before doing any additional slicing/processing. It would be very useful if the TimsTOF object itself could be centroided upon import so that the data contained in all the TimsTOF class attributes/methods reflect centroided values.

Any suggestions or advice would be greatly appreciated.

sander-willems-bruker commented 1 year ago

Dear @triston-groff . Thanks for the kind words, happy you like it! The alphatims.bruker.centroid_spectra() option is used within the index_precursors function, which itself is used to aggregate multiple ddaPASEF scans and provide traditional MS2 spectra for (proteomic) identification (it is e.g. being called in the export_mgf function). I fully agree that those functions could benefit from severe refactoring and that they are not well documented or easily understood, for which I apologize.

The centroid_spectra function should probably be considered an internal function of alphatims and not be exposed like this throught the API, as it only has a single use case and indeed requires very specific arguments.

Alphatims purposefully does not do any data processing (with the exception of exporting mgf files and the required steps to do so) and is intended to handle the raw data directly. However, I recognize your need to centroid MS1 and MS2 data. For MS1 data, you should be able to use Bruker software (e.g. their feature finder). For MS2, this is more difficult as DDA and DIA provide different data and there is also the quadrupole to consider. That said, there might soon be some open-source Python packages released that should be able to do peakpicking/centroiding in diaPASEF data...;)