Open memoryleak47 opened 2 years ago
I believe there are a handful of structures that only contain alpha-carbon information. If you inspect the RCSB entry, you'll find this is the case for this structure. You can also see the pattern of (N, Calpha, C) in the tertiary data, where N and C are missing.
Hopefully Mohammed can correct me if I am mistaken, but I hope my comment can help for now.
I see! So sometimes individual atoms can be missing in spite of a "+" mask.
But can we assume that each (0, 0, 0) atom is in fact just missing data? Or is there some other procedure to know which atoms are valid?
Correct. I believe the mask is on the residue level and not the atom level.
Yes, I would think it is reasonable to assume that and it is most likely described somewhere in the documentation here. On Mar 18, 2022, 2:02 PM -0700, memoryleak47 @.***>, wrote:
I see! So sometimes individual atoms can be missing in spite of a "+" mask. But can we assume that each (0, 0, 0) atom is in fact just missing data? Or maybe is there some link where I could read those details up? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
Correct. I believe the mask is on the residue level and not the atom level.
Ah, true!
If I'm not overlooking something, it doesn't seem to be mentioned in the documentation here https://github.com/aqlaboratory/proteinnet/blob/master/docs/proteinnet_records.md nor anywhere else on this github page.
Is there some external resource where I could read that up?
I'm afraid I don't have more information. I'm not affiliated with ProteinNet, though I use the provided data and dataset splits in my own research.
When implementing an RGN for a university project, we stumbled upon a few apparant irregularities in the text-based CASP7 dataset provided here. That is, quite a few atoms in the tertiary data were positioned at (0,0,0) even though the mask was +, i.e. the atom was considered to be 'valid'.
Example taken from CASP7/validation.
In this example two thirds of the atoms are positioned at (0, 0, 0). Is this a bug, or am I simply misinterpreting the given data somehow?
Thanks in advance!