Noble-Lab / casanovo

De Novo Mass Spectrometry Peptide Sequencing with a Transformer Model
https://casanovo.readthedocs.io
Apache License 2.0
102 stars 37 forks source link

Migrating PeptideMass, PeptideDecoder, and PeptideEncoder from depthcharge v0.2.3 to casanovo #337

Open CCranney opened 4 months ago

CCranney commented 4 months ago

Hi all,

I noticed that casanovo cannot utilize versions of depthcharge above v0.2.3, largely because several key classes were removed between v0.2.3 and v0.3 of depthcharge (specifically the PeptideMass, PeptideDecoder, and PeptideEncoder classes). I assume there has been some exchange on the separation between casanovo and depthcharge, but have not been able to find it in the github discussions or issues pages. If that discussion exists and this is a redundant request, please let me know.

Correct me if I'm wrong, but possible future developments - such as #321 - would require casanovo to utilize higher versions of depthcharge, so this discrepancy would need to be addressed eventually. I see one possible solution being the migration of those lost classes from depthcharge v0.2.3 to casanovo so they can continue to be maintained independently of depthcharge. I'm making this issue to ask if that would be a desirable outcome, as well as to track the necessary updates leading up to a pull request if it is.

I tinkered with migrating the requisite files while maintaining their respective commit histories, and managed to do so here (specifically files peptide_transformers.py and masses.py). The location and file names were chosen arbitrarily as I experimented with the migration process, but I'd imagine you would want them in specific locations.

If this is a feature you would like to implement, I can make the requisite changes (including migrating the unit tests) and create a pull request. Before doing that I would primarily need to know how you would like them saved, as changing the location of the files in a pull request while maintaining commit history is difficult. Perhaps something like the following?

.
├── data
└── denovo
└── peptides
    └── masses.py
    └── transformers.py

Let me know, Caleb

bittremieux commented 4 months ago

You're right that Casanovo is currently limited to DepthCharge v0.2.x. This is currently by design.

DepthCharge has undergone several breaking changes in v0.3 and v0.4, which will require significant refactoring of Casanovo. However, these changes are all based on making DepthCharge more powerful and more flexible. So there's not really functionality lost in DepthCharge, if anything, there's more functionality now (i.e. support for small molecules).

One of our major tasks in the next few weeks/months will be to upgrade the DepthCharge version used in Casanovo and then build on this new functionality to improve Casanovo. This will take significant effort under the hood, but should be transparent for the user at first, then resulting in improved performance later.

CCranney commented 4 months ago

Understood - I did a reread of depthcharge code after your comments, thank you for pointing out my misunderstanding. I see now that the PeptideDecoder and PeptideEncoder classes were not deleted, but moved to transformers -> analytes.py and renamed to AnalyteTransformerDecoder and AnalyteTransformerEncoder (to enable flexibility beyond peptides, as you said).

Also thank you for pointing out that issue milestone page in your "improve Casanovo" hyperlink. I was sure there was some discussion on the depthcharge/casanovo version discrepancy, but couldn't find it at the time. Thanks for this!

I read through some of the issues on the Depthcharge v0.4 Milestone, and none of them directly address this. Is that an easy, implied fix, and you'd prefer this issue was closed? My question has been answered, so am fine with closing it. But it could serve as a small reminder for specifically updating the peptide encoding/decoding process.

bittremieux commented 4 months ago

It's far from an easy fix, there will need to be significant refactoring. But because it deals with the internals of Casanovo that users are typically not exposed to, we hadn't created an explicit issue yet. Instead, we were tracking progress on the DepthCharge updates in our regular team meetings, although this is indeed a bit opaque for external developers.