Fix author chain ID-author residue ID issue causing `parse()`d `Biomolecules` to have missing residues (e.g., residues `1-10`) filled in (e.g., within chain `B` of `100d.cif`) by dummy residues upon exporting an mmCIF file from a `Biomolecule` object. This specifically happens because sometimes authors of mmCIF files specify that residue indices should be monotonically increasing from the first chain to the last chain (e.g., residue indices 1-10 in chain A and residue indices 11-20 in chain B), and when this happens the standard AlphaFold 2-borrowed logic we have in place currently will treat residues 1-10 in chain B as "missing" and will add padding residues consequently. This will break future re-parsing of these (filtered) mmCIF files since the residue sequences e.g., in chain B will be incorrect from then on. - Githubissues

amorehead / alphafold3-pytorch-lightning-hydra

Implementation of AlphaFold 3 in PyTorch Lightning + Hydra

MIT License

21 stars 6 forks source link

Fix author chain ID-author residue ID issue causing `parse()`d `Biomolecules` to have missing residues (e.g., residues `1-10`) filled in (e.g., within chain `B` of `100d.cif`) by dummy residues upon exporting an mmCIF file from a `Biomolecule` object. This specifically happens because sometimes authors of mmCIF files specify that residue indices should be monotonically increasing from the first chain to the last chain (e.g., residue indices 1-10 in chain A and residue indices 11-20 in chain B), and when this happens the standard AlphaFold 2-borrowed logic we have in place currently will treat residues 1-10 in chain B as "missing" and will add padding residues consequently. This will break future re-parsing of these (filtered) mmCIF files since the residue sequences e.g., in chain B will be incorrect from then on. #2

Closed amorehead closed 2 months ago