ViennaRNA / forgi

An RNA manipulation library.
GNU General Public License v3.0
51 stars 30 forks source link

How to handle pdb-files with missing ATOM records? #18

Closed Bernhard10 closed 5 years ago

Bernhard10 commented 7 years ago

Currently we predend that the missing residues are not there. Is there a better way to do this?

Bernhard10 commented 7 years ago

This might effect the fragment library generation in ernwin, if we generate fragments from nr_lists.

Bernhard10 commented 7 years ago

See also: https://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/primary-sequences-and-the-pdb-format

Bernhard10 commented 7 years ago

Version 1.0 introduces ftug.get_incomplete_elements and cg.incomplete_elements. The later is a list of element names, inside of which at least one residue is missing.

ghost commented 6 years ago

Could the implementation allow for both?

E. g. to have a switch to allow for reporting (warnings) missing coordinates and also one to "just proceed anyway without notifying the user"?

Either way, it may be useful to put it into the documentation more properly (and what is expected of users; I assume that the ideal way would be if all .pdb files are without error and all users only use error-free .pdb files, even if it may be unrealistic).

Bernhard10 commented 6 years ago

I am currently working on this. I have submitted a pullrequest to biopython to include parsing "REMARK 465" lines from PDB: https://github.com/biopython/biopython/pull/1237

Forgi will then be able to act as if the missing residues had never existed, but can also output them when desired. Work on this is in the feature/2.x/missing_residues branch.

Bernhard10 commented 5 years ago

The Sequence class in forgi 2.0 now solves this issue.

You can do rna.seq.with_missing or only rna.seq for two versions of the sequence.