bittremieux / falcon

Large-scale tandem mass spectrum clustering using fast nearest neighbor searching.
BSD 3-Clause "New" or "Revised" License
24 stars 7 forks source link

Similar MS/MS did not make it into clusters output #16

Closed mwang87 closed 2 years ago

mwang87 commented 2 years ago

These set of spectra are nearly the same, but do not appear in the clustered result set - Link.

Some examples to make it more clear:

mzspec:GNPS:TASK-f5094e83a4f042e88f0d423dcb52b11c-query_results/extracted/extracted_mzML/extracted_53.mzML:scan:2678 mzspec:GNPS:TASK-f5094e83a4f042e88f0d423dcb52b11c-query_results/extracted/extracted_mzML/extracted_59.mzML:scan:2544

mirror (8)

We can note in the output from Falcon, there is no precursor m/z in this mass range - Link

bittremieux commented 2 years ago

The issue here is that certain scans are missing from the output?

Very weird at first sight. But if falcon was run with the default (proteomics-centric) settings, those spectra are probably just considered "low-quality." Specifically, by default only spectra with a minimum m/z range spanning 250 m/z are retained. This does not seem to be the case for the spectra in the example.

I can change these defaults, along with the minimum number of peaks (5) and minimum m/z to consider (101 m/z), to be more metabolomics-friendly. Although it's challenging to find defaults that work well in all cases.