where collapse option variables are used ?

cbroeckl / RAMClustR

Assigning precursor-product ion relationships in indiscriminant MS/MS data

MIT License

12 stars 16 forks source link

where collapse option variables are used ? #32

Closed jsaintvanne closed 1 year ago

jsaintvanne commented 1 year ago

Hi,

Sorry it's me again... ! I'm asking myself where the collapse option operaitons are used after ? https://github.com/cbroeckl/RAMClustR/blob/master/R/ramclustR.R#L784

Because the ramclustObj$SpecAbund object looks not used after ? (or the ramclustObj$SpecAbundAve

Can you light me please ?

cbroeckl commented 1 year ago

@jsaintvanne - that is the last spot the collapse option is used. It only sets the option to collapse feature intensities into compound (spectrum) intensities. the SpecAbund dataset is returned in the ramclustObj$SpecAbund slot. If any provided sample names are identical, then then ramclustObj$SpecAbundAve slot is created, and signal intensities are averaged over identical sample names.

jsaintvanne commented 1 year ago

Thanks again for your fast answer @cbroeckl !

A question now because I think I understood that you explain : after all you described, the ramclustObj$SpecAbundAve slot is not used as output in the msp file riht ? the ramclustObj$SpecAbund is not also ?

cbroeckl commented 1 year ago

@jsaintvanne

Correct. i store the feature intensities in $fmz, and store ms and msms intensities in $msint and $msmsint, respectively.

jsaintvanne commented 1 year ago

Thanks again for your answer !

So maybe I didn't understand something but why having a collapse option if in the MSP file output you reused the ramclustObj$MSdata data from data1 which are not calculated in collapse option ?

Maybe I misunderstand something and I will be happy to have the explanation

cbroeckl commented 1 year ago

MSdata - feature intensities by sample and feature (mz+rt), MS1 level intensities MSMSdata - same as above, but for MS/MS (MSe) intensities SpecAbund - Collapsed MSdata, but feature cluster. This is the dataset that should be used for statistical analysis.

reduced dimensionality, and therefore:
reduced statistical false discovery rate burden
reduced analytical variance
reduced feature redundancy
statistical analysis of compound (spectra) rather than features. SpecAbundAve - SpecAbund, but collapsed by replicate injections, if available.
.msp output - contain spectra suitable for annotation workflows.

Each section serves a different role. the MSdata and MSMSdata are records of features, and are used as input to the ramclustR clustering algorithm. The output of the clustering algorithm are linkages, which we use to generate SpecAbund, SpecAbundAve, and .msp spectra, all of which have value in different downstream workflows.