biotite-dev / biotite

A comprehensive library for computational molecular biology
https://www.biotite-python.org
BSD 3-Clause "New" or "Revised" License
661 stars 101 forks source link

Computation of Masses #484

Open t0mdavid-m opened 1 year ago

t0mdavid-m commented 1 year ago

Currently masses can only be computed from AtomArray objects (and objects of the related classes) or residue names from the Chemical Component Dictionary, using the biotite.structure.info.mass() function. However, especially for applications related to proteomics, it is useful to compute the masses of sequences. Therefore, I would like to propose to add a similar function to the sequence package (i.e. biotite.sequence.mass() or biotite.sequence.info.mass()).

Masses could be obtained from the biotite.structure.info.mass() function using the residue name for each monomer, while accounting for the mass difference resulting from the linkage.

Furthermore, I would like to suggest adding an option for monoisotopic masses. While these can not be obtained via the Chemical Component Dictionary, they could be parsed from the PubChem database using the SMILES (in the Chemical Component Dictionary) or calculated on the fly.

padix-key commented 1 year ago

For protein sequences there is ProteinSequence.get_molecular_weight(), so for now this functionality is already implemented. However, I think your solution of a sequence.info would be more consistent with the structure.info package, if the need/request for more sequence information related functionality (masses, isoelectic points,, etc.) arises. Therefore, I would suggest to revisit this idea again, when such features are planned.