chemfiles / chemfiles

Library for reading and writing chemistry files
http://chemfiles.org
BSD 3-Clause "New" or "Revised" License
164 stars 47 forks source link

PDB format: write as models #103

Closed pgbarletta closed 7 years ago

pgbarletta commented 7 years ago

I've been using a modified write function for a while. It came handy to let visualization programs know they were reading a trajectory: void PDBFormat::write_model(const Frame& frame, const size_t model_n);

The only difference being that each time the first and last lines would be: fmt::print(*file_, "MODEL {:4d}\n", model_n); fmt::print(*file_, "ENDMDL\n\n");

The result is a PDB format without an END line, but that doesn't cause any trouble, and if necessary, it's easy to fix. I know it's a scrappy implementation, but I think it's a useful feature. To make it nicer, I was thinking of overloading the write function for the PDB format:

void PDBFormat::writel(const Trajectory& trajectory);

What do you think @Luthaf?

Luthaf commented 7 years ago

It came handy to let visualization programs know they were reading a trajectory:

I initially used the END record, because this is what VMD does for trajectories in PDB format. I don't know about the MODEL record, is it intended for use with trajectories ?

Do you have an example of a visualization program supporting the MODEL record, but not multiple END record? As long as this change does not break VMD (which is widely used in the biomolecular community), I am fine with it!

If we implement this for writing PDB, we should also add support for it when reading PDB files.


The idea would be to replace

CRYST1 ...
ATOM ...
END
CRYST1 ...
ATOM ...
END
CRYST1 ...
ATOM ...
END

by something like this:

MODEL 1
CRYST1 ...
ATOM ...
ENDMDL
MODEL 2
CRYST1 ...
ATOM ...
ENDMDL
MODEL 3
CRYST1 ...
ATOM ...
ENDMDL

Am I right ?

If so, we don't need to pass an additional modeln parameter: we can store it in the PDBFormat class itself (it is called step in some other formats), or use the Frame::step value for this.

The result is a PDB format without an END line, but that doesn't cause any trouble, and if necessary, it's easy to fix.

Yeah, we could add a Format::finalize function and reflect it to Trajectory if needed. Or just call it from the Trajectory destructor.

pgbarletta commented 7 years ago

I initially used the END record, because this is what VMD does for trajectories in PDB format. I don't know about the MODEL record, is it intended for use with trajectories ?

It was meant for RMN experiments that sampled different conformations. That's why they call it MODEL instead of, for example STEP. They see it as alternative structures instead of dynamics.

Do you have an example of a visualization program supporting the MODEL record, but not multiple END record?

Yes, Pymol stops reading as soon as it reads an END line.

As long as this change does not break VMD (which is widely used in the biomolecular community), I am fine with it!

Yeap. I'm reading these files with both VMD and Pymol.

Good then, I'll start looking into it.

Luthaf commented 7 years ago

Is this fixed or do we need to do something else here?

pgbarletta commented 7 years ago

Yeap. Done.

pgbarletta commented 7 years ago

Sorry, found something new. With the last mods, Chemfiles only recognizes trajectories if there's an END record after each frame. That's why the test works, but a properly formatted PDB file (with ENDMDL records aftear each frame and only one END record at the end of the file) with numerous frames is read as a 1 frame file.

As a consequence, Chemfiles can write a whole trajectory in a PDB file and read that same file and only get the first frame.

I can't find the bug in the source code.

As an example: https://gist.github.com/pgbarletta/fb63b8263ca898914af3ba409bcaf90d