BojarLab / CandyCrunch

Predicting glycan structure from LC-MS/MS data
MIT License
20 stars 5 forks source link

Glycan mass mismatch? #2

Closed daichengxin closed 5 months ago

daichengxin commented 6 months ago

Thanks for making it. When I downloading your scripy and datasets. I found the glycan label and precursor mass don't seem to match. A example in full dataset xlsx file: The precursor mz and charge are 657.746032714844 and 1, which is not matched the glycan mass (1298.4759603335401) with 0.5 Da tolerence. Sorry, I am a newcomer. Looking forward to your favourable reply.

reducing mass glycan filename GlycoPost_ID
657.746032714844 GlcNAc(b1-2)Man(a1-?)[GlcNAc(b1-4)][Man(a1-?)]Man(b1-4)GlcNAc(b1-4)GlcNAc \0000_180801_CM90_C_NG.mzML GPST000030
Bribak commented 6 months ago

Hi! Thanks for engaging with our work. First of all, the reducing_mass column name is an unfortunate misnomer (should be fixed long-term, just has a few downstream consequences), it's actually m/z, so charge=1 cannot be assumed; in this case it's doubly-charged. Further, we do consider adducts, so sometimes the m/z might include an adduct. Lastly, it can't be excluded that the originally reported m/z values were incorrect in some cases. Probably a good idea to go over the dataset at some point and fix those cases.

daichengxin commented 6 months ago

Thanks for your reply. I matched the mz values with the mzML file. Only a precursor is searched, but the charge state is 1. It confuses me. image

Bribak commented 6 months ago

Could absolutely be that it's an extraction mistake in this case. Depending on the vendor format / user, we don't always have precursor charge as an information, so we don't use it for extraction, to have a streamlined format. Thus, sometimes a singly-charged mass could be mistakenly extracted as an assumed doubly-charged + adduct etc. During prediction, we filter those cases out (e.g., if no peak is larger than the m/z but the assumed charge is 2, then the prediction is filtered out)