cbroeckl / RAMClustR

Assigning precursor-product ion relationships in indiscriminant MS/MS data
MIT License
12 stars 16 forks source link

PrecursorMz #47

Closed Slycopersicum closed 9 months ago

Slycopersicum commented 1 year ago

Hi, I use the function writemsp to import the full dataset to a spectra object. I notice that the precursor that is given to each compound differ from the precursor calculate with the do.findmain. In general the precursor is a mz higher, from the same ms1 group but lower intersity.

cbroeckl commented 1 year ago

can you provide a detailed example? Also, please provide the details of your R and RAMClustR versions using sessionInfo()

Slycopersicum commented 1 year ago

For example is a requiere the precursor of the c001 ( RC$precursor.mz[1] i get this mass: 609.3411 but in the file c001 create with the function dofindmain the precursor is 293.1752. So I check RC[["M.ann"]][[1]]. 293.1752 100 3 0 1 [M-H]- NA [M-H]- 609.3411 12.33356 6 0 1 [2M+Na-2H]- 2.430297 [2M+Na-2H]-

Also there was a issue writing the msp with one file only (there is need a space after the : some terms to import to a spectra object. I dont know if there is a easier way to connect ramclustR to spectra package.

I use this option to chose the precursor in the write.msp function (I dont know if its right but now I get the same mz): ramclustObj[["M.ann"]][[i]][["mz"]][[which.max(ramclustObj[["M.ann"]][[i]][["int"]])]]

Thanks for your time

R version 4.3.0 RAMClustR_1.2.4

cbroeckl commented 1 year ago

@Slycopersicum - i like the github username ;-) sly tomato....

I will start by saying that this findmain annotation process has been fairly fluid - i am trying to find the most reliable approach, and in doing so, change code and ramclustObj structure too readily. I am also more commonly using .mat format output than MSP, so it is possible i have missed updating code for the msp output format. This is a symptom of me not being a real computer scientist, rather a bioanalytical chemist who codes. My apologies.

I think that what you describe is actually normal.

I built into this version two alternate scoring approaches. Both use the raw output from the interpretMSSpectrum::findmain() function, but how the output are ranked may differ between the two. The two sets of results are here:

> RC$M.findmain[7]
[1] 754.3322
> RC$M.ramclustr[7]
[1] 160.125

This is an example from a spectrum where the two scoring formulas generated different results. By default i was using the ramclustr score when they differed. This was a decision made largely on results i saw from my instrument and conditions. you can see a record of which scoring method is used to select the 'best' match by looking at the RC$use.findmain slot. Every value which is listed as TRUE is using the findmain scoring best match, while every value that is FALSE is using the ramclustr version.

This may explain why you are seeing what you see. i.e. the precursor mass being exported is correct. In the case from the example listed above:

> RC$use.findmain[7]
[1] TRUE
> RC$precursor.mz[7]
[1] 777.3206
> RC$precursor.type[7]
[1] "[M+Na]+"

if i look at a compound where the best ranked findmain output agree:


> cmpd <- 5;  RC$M.findmain[cmpd]; RC$M.ramclustr[cmpd]
[1] 983.4735
[1] 983.4735
> RC$use.findmain[cmpd]
[1] FALSE
> RC$precursor.mz[cmpd]
[1] 1006.451
> RC$precursor.type[cmpd]
[1] "[M+Na]+"

Does this clarify things, or are your ramclust objects still inconsistent with the MSP output?

cbroeckl commented 1 year ago

forgot to add: you can use interpretMSSpectrum scoring by defualt by setting the option, scoring = 'imss'

Slycopersicum commented 1 year ago

I think now I understand, sorry I am relatively new to metabolomics. Also, mse data it isn’t easy to work with, I only found Msdial in additional to ramclustr. I have a question about the intensity signal, if it references to precursor or to the group of ms1 in that compound. And finally, there is a way to export the data to GNPS. Thank you

cbroeckl commented 1 year ago

Looks like we would need an MGF or mzML export. i will have to look into this more. @hechth - any experience writing these out?

hechth commented 1 year ago

Looks like we would need an MGF or mzML export. i will have to look into this more. @hechth - any experience writing these out?

You can convert the file into mgf using matchms. The conversion tool is also hosted on UMSA Galaxy ->matchms convert. Hope this should help you use the spectra in GNPS. Please let me know if this works for you!

hechth commented 1 year ago

@Slycopersicum did you figure out how to do the conversion? You can use this tool https://umsa.cerit-sc.cz/root?tool_id=toolshed.g2.bx.psu.edu/repos/recetox/matchms_convert/matchms_convert/0.20.0+galaxy0

Slycopersicum commented 1 year ago

Sorry I didn’t answer before. Yes, the tool works perfectly. However, I notice that the version of 1.3 differ from the version 1.2.4 in order to work with xcms. I mean in the previous version you could preprocess ms1 and ms2 with the same mzML file and after calling the ramclustr function like: rc<- ramclustR(xcmsObj = xdata, ExpDes = experiment, taglocation = "pheno", MStag = info$sample_names, idMSMStag = info$sample_names) In the new version, this fall to an error, calling the different between ms and msms files. I think the problem in the grep but and now sure. Thanks for your time

hechth commented 1 year ago

@Slycopersicum can you describe this maybe in a bit more detail? This seems an unintended side effect and we will try to fix that.

By the way you can also use RAMClustR on Galaxy together with XCMS.

Slycopersicum commented 1 year ago

Hi, It looks like it fail only when you have the ms and msms information in the same file, the error say that I have 1 idmsms file and n ms file whatever the number of files added (n). I still work with the previous version so I cant say a lot more about the error. Sorry