douweschulte / pdbtbx

A library to open/edit/save (crystallographic) Protein Data Bank (PDB) and mmCIF files in Rust.
https://crates.io/crates/pdbtbx
MIT License
53 stars 16 forks source link

Atom wraparound #71

Closed DocKDE closed 3 years ago

DocKDE commented 3 years ago

I just realized that the same issue that arises due to residue serial numbers above 9999 also applies to atom serial numbers above 99999. I added preliminary support for reading this from file in the same spirit as you did for the residues. When it comes to implementing support for saving this again, I'm afraid I need some help because I don't understand how you did that for the residue issue.

douweschulte commented 3 years ago

Nice good work, looks like it should work,. Could you just to make sure that it really works and keep working in the future add a unit test like I added for #66 tests/wrapping_residue_number.rs? The saving should just work as it is now. For #66 I made it so that every single field would be clipped in saving. But making sure it works that way would be a sensible thing to do in a test. I would propose to load a specific line from the file (after reading + saving) and checking if the number is indeed clipped (just simple string comparison the what you know should be the line). If you cannot get a test working I could also add it in. Otherwise good work!

DocKDE commented 3 years ago

I meant to do a test after getting your input on it. I'll write one next week and add it to the PR


29.10.2021 17:30:00 D.Schulte @.***>:

Nice good work, looks like it should work,. Could you just to make sure that it really works and keep working in the future add a unit test like I added for #66[https://github.com/nonnominandus/pdbtbx/issues/66] tests/wrapping_residue_number.rs? The saving should just work as it is now. For #66[https://github.com/nonnominandus/pdbtbx/issues/66] I made it so that every single field would be clipped in saving. But making sure it works that way would be a sensible thing to do in a test. I would propose to load a specific line from the file (after reading + saving) and checking if the number is indeed clipped (just simple string comparison the what you know should be the line). If you cannot get a test working I could also add it in. Otherwise good work!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub[https://github.com/nonnominandus/pdbtbx/pull/71#issuecomment-954835631], or unsubscribe[https://github.com/notifications/unsubscribe-auth/ALWXRSWIZEPSUEYQUAC7RXDUJLDYNANCNFSM5G7K2QNA]. Triage notifications on the go with GitHub Mobile for iOS[https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675] or Android[https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub]. [###24x24:true###][Verfolgungsbild][https://github.com/notifications/beacon/ALWXRSVTXT43RBM6PUTA65DUJLDYNA5CNFSM5G7K2QNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOHDU2FLY.gif]

douweschulte commented 3 years ago

Great! I will look forward to it.

DocKDE commented 3 years ago

Ok, I wrote a test for atom wraparound and clipping which both pass. However, the read_write_pdbs test does not pass with the new PDB file I provided and I'm not yet sure why that is. Also I changed the save_pdb function to accept a &PDB as input instead of PDB. I stumbled across this a couple of times and didn't see a reason why the struct needed to be passe by value. If you disagree, feel free to change it back.

DocKDE commented 3 years ago

The failing test is due to the fact that the PDB file I added for this issue does not have a chain ID. This leads to the open_mmcif function failing because it expects one. Of course, it would be possible to add a chain ID to the PDB file by hand but it's quite common for PDB files to not have one so I don't think this should be a breaking error. How do you want to handle this?

douweschulte commented 3 years ago

Good work, I agree with the change to &PDB although that means that the update with this PR will be a major update (8.0) as this will break existing code. With the failing test I would suggest to change the open_mmcif function, as you say it is common to not have a chain ID. Would just setting it to A suffice? Or maybe something like an automatic counter increasing with every TER record (counting in alphabet of course AB...)?

DocKDE commented 3 years ago

Implementing a counter that increases after every TER record seems like a good idea to me.

douweschulte commented 3 years ago

Great then that will be the way forward. If you like you could add that to this PR otherwise I will see when I get around to it. I think it is best to merge once the fix is in place so that the code is always working. You can just ignore the code coverage, that one just started to fail since yesterday for no apparent reason.

DocKDE commented 3 years ago

I'll see what I can do :)

DocKDE commented 3 years ago

I'm trying to make sense of the mmcif parser. Can you tell me where or how it would detect a TER record?

douweschulte commented 3 years ago

There are no TER records in mmCIF. For now I think only the PDB parser has to be changed, it will output valid mmCIF so the tests should be able to run.

DocKDE commented 3 years ago

Ok, I changed the PDB parser to add a chain ID from A to Z if none is present. However, after these 24 IDs have been exhausted the next assigned one will be invalid. This is not ideal but I haven't yet figured out how to implement this more elegantly or wrap around to the beginning. Tests all pass which is nice.

DocKDE commented 3 years ago

Ok, done. For some reason I wrote 24 earlier, the alphabet obviously has 26 letters... Anyway, now the assigned chain IDs will wrap around after reaching "Z". I also took the liberty to bump the version number and make a few changes in the changelog. Feel free to change that or add to it!

douweschulte commented 3 years ago

Great work! Looks all good to me. I will merge it and update on crates.io in a couple of days so that objections can be raised in the meantime. Small point: I will change the changelog to specify that the chain ID change is concerning PDB files.