bittremieux / ANN-SoLo

Spectral library searching using approximate nearest neighbor techniques.
Apache License 2.0
42 stars 19 forks source link

Feature Request: Accept Separate Input files for First/Second Search Stages #9

Closed bjreisman closed 4 years ago

bjreisman commented 4 years ago

Hi ANN-SoLo developers, I'm attempting to use ANN-SoLo to identified peptides which have been chemically modified with a large adduct. We know the adducted peptides have a characteristic neutral loss fragment and have set up the instrument to only fragment peptides which have been modified such that the MS2 scan should not contain unmodified peptides. We also have datasets on the same samples which use traditional data-dependent scanning (DDA) and should contain the unmodified spectra. Would it be possible to run ANN-SoLo with the first stage on one dataset (DDA) and search for the modified peptides on a second dataset (targeted scan)? Apologies if this is confusing, if it sounds like this is something ANN-SoLo is not designed for I'd also understand. Thanks for your help! -Ben

bittremieux commented 4 years ago

Hi Ben, you can use ANN-SoLo to either do a standard search or an open search. The relevant parameters are precursor_tolerance_mass and precursor_tolerance_mass_open.

To run your first standard search, you'd do something like this:

ann_solo --precursor_tolerance_mass 20 --precursor_tolerance_mode ppm --fragment_mz_tolerance 0.02 spectral_library.splib spectra_unmodified.mgf output_unmodified.mztab

By only specifying the precursor mass tolerance for the standard search you will only run the first stage of the cascade search.

Then, to identify modified peptides you just run ANN-SoLo to do an open search:

ann_solo --precursor_tolerance_mass 20 --precursor_tolerance_mode ppm --precursor_tolerance_mass_open 500 --precursor_tolerance_mode_open Da --fragment_mz_tolerance 0.02 spectral_library.splib spectra_modified.mgf output_modified.mztab

Here you specify a precursor mass tolerance for both the standard search and the open search, doing both phases of the cascade search.

This is maybe not entirely what you've described because it still does the standard search as well, but I don't expect that to be a major issue. There will probably still be some unmodified peptides in your second dataset as well, so those spectra can be identified.

Unfortunately at the moment you can't run ANN-SoLo in open mode only, even if you were to set a very large precursor mass tolerance for the first step of the cascade search and only run that. That's because the first step of the cascade search is not optimized for open searching, so it will take a lot of time, and the FDR calculation is different as well (standard FDR versus group FDR).

Let me know if that's useful for you. I'm happy to help you troubleshoot further so that it fits your use case.

bittremieux commented 4 years ago

No activity, closing.

Feel free to re-open the issue or make a new issue if you have further questions.