Open wfondrie opened 3 years ago
One thing I really envisioned would be useful with this PR is the ability to use Prosit libraries with ANN-SoLo. However, there are a couple of hiccups in doing so:
Would it be out-of-scope for ANN-SoLo to also contain a few utility functions to prepare a FASTA file for Prosit? For (1), I would propose adding a function to generate this CSV file from a FASTA file, similar to the functionality already provided by EncyclopeDIA. To solve (2), I think there are a couple options:
decoy_spectral_library_filename
that specifies decoy peptide spectra, implying that spectral_library_filename
only defines targets.What are your thoughts? The CSV and annotating a dlib could alternatively be provided by another package.
Yes, I totally agree. Prosit compatibility has been on my wish list / TODO list for quite some time.
My preference would be an end-to-end solution. Rather than having some manual steps in between getting a CSV to submit to the Prosit web interface,and then converting the output from there again, it would be nicer if ANN-SoLo has the option to generate a spectral library (and its index) from a FASTA directly using built-in Prosit.
Prosit is available as open-source, so it should be possible. Although it might complicate installation instructions more, and they're already a bit advanced.
That is a good goal, but yikes that does complicate installation! Do you know they have a programmatic API for their webserver? That might be an alternative way to go if they do.
Either way, I'll probably make a small separate package to handle these things for now.
This pull request adds a module for parsing the ELIB and DLIB spectral libraries,
src/ann_solo/sqlite_parsers.py
. These are SQLite3 formats from EncyclopeDIA and are defined here. The PR also changes the logging level toINFO
.This module should be easy to expand in the future to also parse BLIB libraries from Bibliospec (as requested in #2).
I'm still working on benchmarking, but it seems good so far.