Closed dudimarcus closed 7 years ago
Working on this now...
Great! thanks Ian
Okay. Progress.
Should be finished by the time I get home - I'll commit the new lookup file and all should work correctly.
That's great! thanks Ian. Looking forward to testing it tomorrow, previous version already showed great results.
Okay, this should be fixed now.
As a summary:
Can you confirm this works and close the ticket if you're happy.
Yay, its fixed, all misalignment examples are corrected now.
Thanks Ian!
Great.
For the record - there were 4 domains (of 383,628) where the domain boundaries of our internal files do not match the results of the original algorithm that was used to generate them.
I'm not yet sure how this happened, but just to let you know that if you happen to hit these domain ids (they are all identical sequences) then the script will throw an exception.
$ grep 'Error' data/domain-sequence-numbering.v4_1_0.txt
# Error: 4ikpA01 has mismatching domain boundaries
# Error: 4ikpB01 has mismatching domain boundaries
# Error: 4ikpC01 has mismatching domain boundaries
# Error: 4ikpD01 has mismatching domain boundaries
Hi, I noticed that when the sequence starts above >1 there is always misalignment to the pdb SEQRES which means the fix for start-stop could be not calibrated.
for example:
good: correctly starts in 1
bad: should actually started in 24 not 8