douweschulte / pdbtbx

A library to open/edit/save (crystallographic) Protein Data Bank (PDB) and mmCIF files in Rust.
https://crates.io/crates/pdbtbx
MIT License
50 stars 13 forks source link

Customize `StrictnessLevel` #127

Open rvhonorato opened 4 months ago

rvhonorato commented 4 months ago

I'm using pdbtbx to validate input PDBs in our new frontend via wasm.

While implementing it I've came acrross the StrictnessLevel which is a nice way to be more or less permissive in the format. For our application in specific, some of the things that are marked as InvalidatingError can actually be handled by the molecular engine (CNS).

Before I write something to automatically fix these so that they pass the pdbtbx validation I think its best to ask first if; is there a way to customize the strictness to allow for certain errors?

douweschulte commented 4 months ago

There currently is not. But with PR #126 more fine grained control is added to the opening of files, if something like this is useful we could think about a way of allowing control here.

The point of invalidation errors though is that I found them to be so bad I could not unambiguously parse the file, this judgement is just my own though so changes can be made. Maybe the easiest way would be to add a strictnesslevel in between two already present and allow some of the warnings there. Or potentially if we want to go the full control via OpenOptions route we could give a list of errors/warnings to ignore there.

OWissett commented 4 months ago

I think that this is possible, we could potentially change how we use the strictness level, and make it work more like a bitfield options approach that you'd use in C/C++. I'm not sure how idiomatic bitfields are in Rust, I think fine for internal usage?

The idea would be that we keep the current StrictnessLevels, but make it so that that type actually sits on top of a bitfield which controls which types of errors are raised. So the current levels will have certain default bits set, but we also then expose an API for constructing custom StrictnessLevels, where in some contexts people want certain checks but not others.

The next thing I wanted to ask, is would it make sense to have it so that PDBTBX can automatically fix simple issues with PDBs? We could have that as a setting on the new ReadOptions mechanism, e.g. ReadOptions::auto_fix()

douweschulte commented 4 months ago

Bitfields do not have built in support but there are crates for it. Keeping it internal sounds good.

I like the sound of auto_fix but some fine grained control might be needed here as well. Right now I mainly have the renumber function which could become part of the auto_fix.

OWissett commented 4 months ago

My lab has a PDB fixing script in python which performs lots of corrections for PDB files, it might be interesting to add some of them to PDBTBX?

Again, I think this comes down to what do we want from PDBTBX? I think that this is also potential feature creep (I know I am the one who suggested the auto_fix function).

OWissett commented 4 months ago

I do think that having PDB validation and fixing tools would make sense for this crate, but it might be a good idea to add them behind a feature flag, depending on how much code it takes.

rvhonorato commented 4 months ago

+1 for auto_fix!

OWissett commented 4 months ago

I will checkout a branch to have a look at implementing it. @rvhonorato are you able to give me a list of which checks you care most about and want changing?

OWissett commented 4 months ago

I am currently blocked by #126 since I will need to use what this adds.

rvhonorato commented 4 months ago

I will checkout a branch to have a look at implementing it. @rvhonorato are you able to give me a list of which checks you care most about and want changing?

The ones I've found more frequently are: