chemfiles / chemfiles

Library for reading and writing chemistry files
http://chemfiles.org
BSD 3-Clause "New" or "Revised" License
159 stars 47 forks source link

Handling of unit cell parameters when they are missing from the file #419

Open shoubhikraj opened 2 years ago

shoubhikraj commented 2 years ago

I am reading a DCD file (generated with NAMD) with chemfiles. In NAMD, the simulation was run without any periodic boundary conditions, however, it seems that NAMD sets the unit cell lengths to 1 Angstrom and angles to 90 degrees each when there is no boundary.

The chemfiles documentation mentions that the unit cell information is not read from the DCD file format. So, I am not sure if chemfiles is setting those values by default, or whether it is actually reading them in from the DCD file. (The VMD molfile plugins do read the unit cell information from DCD.)

Either this is a bug in chemfiles, or the documentation has not been updated.

Also, would it be possible/reasonable to recognize that the 1 Angstrom value of unit cell was set because there is no unit cell (because 1 Angstrom is unlikely to be the size of an unit cell in any practical simulation), and then set it to infinity (i.e. no unit cell)?

This would eliminate issues when reading a trajectory without PBC.

The DCD file, and the corresponding PSF and PDB files are attached: testfiles.zip.

Luthaf commented 2 years ago

Since we are currently using the VM molfile reader for DCD files (until #370 is implemented), we should update the documentation! This part of the doc is generated directly from the code here https://github.com/chemfiles/chemfiles/blob/b6b75c3780120628bd8dd29e174d361bd48fc9bd/src/formats/Molfile.cpp#L342

@ShoubhikRaj if you have time to send a pull request it would be very appreciated, otherwise I'll do it when I'll find the time!

Detecting cells with 1A/90° as infinite cells is definitively something we should do, but is a bit more work. I think this would apply to PDB, mmCIF, DCD format at least, maybe there are other formats following this convention?

shoubhikraj commented 2 years ago

@Luthaf Thank you, I have opened a pull request. (#420)

I am not sure about other formats. For example, I think gromacs XTC files hold unit cell data. However, the newest versions of gromacs (>v2020) completely disallow simulations without PBC. Testing with earlier versions will reveal what gromacs outputs when there is no PBC.

I don't think PDB files always have to have a unit cell. If the CRYST1 record is missing then it does not have any unit cell information. So, I am not sure if it will always apply to PDB. I have never used mmCIF files.

shoubhikraj commented 2 years ago

@Luthaf I have found out that the problem with unit cell parameters is coming from the VMD molfile plugin itself, the DCD file does not have unit cell data if the simulation was run without PBC in NAMD.

When the DCD file without PBC is opened with MDAnalysis, it says that there is no unit cell data in the trajectory file. MDAnalysis does not use molfile plugin to read DCD, it uses it's own dcd reader written in cython (https://github.com/MDAnalysis/mdanalysis/blob/develop/package/MDAnalysis/lib/formats/libdcd.pyx). So, it seems that VMD molfiles automatically puts those values when it cannot read the unit cell data from DCD file, and since chemfiles depends on molfiles, it gets those data. So, it should be solved by #389.

ezavod commented 2 years ago

Sorry for chiming in this late.

Detecting cells with 1A/90° as infinite cells is definitively something we should do

I feel like this would introduce rather unpredictable behavior. After all, this is a completely valid orthorhombic cell. Software like Lammps can have all sorts of different units which means that a unit cell with lengths 1 might be also physically sensible. AFAIK Lammps is also able to produce DCD files.

However, the newest versions of gromacs (>v2020) completely disallow simulations without PBC. Testing with earlier versions will reveal what gromacs outputs when there is no PBC.

Unfortunately, I was unable to easily test the behavior on older versions because of all sorts of weird crashes. If anybody is interested in trying, here are more details about vacuum simulations. Anyway, I believe that Gromacs simply dumps a zero matrix (judging from this function that is used when reading some input files). This means that XTC and TRR files should handle infinite cell correctly. Setting everything to zero seems to be the norm in general. I guess all the other formats are therefore fine too?

I have found out that the problem with unit cell parameters is coming from the VMD molfile plugin itself

This insight makes fixing this much easier. The offending part is here. Should we simply change the default to 0.0? I can do a PR, if you think that this is an acceptable fix.

Luthaf commented 2 years ago

I feel like this would introduce rather unpredictable behavior. After all, this is a completely valid orthorhombic cell. Software like Lammps can have all sorts of different units which means that a unit cell with lengths 1 might be also physically sensible. AFAIK Lammps is also able to produce DCD files.

I agree that we don't want to do this for all formats!

I'm only suggesting to do it for formats where this kind of 1A cell is used to show missing PBC. For example, some experimental PDB files (from cryo-EM) use this convention when they don't know the cell size: https://github.com/MDAnalysis/mdanalysis/issues/2599. I also mentioned mmCIF here since that's the other format used a lot in wwwPDB. I'll dig a bit deeper on which formats this should apply to, but PDB/mmCIF/MMTF would be the main suspects.


It looks like this is not a format issue for DCD files but rather a molfile problem, so fixing the plugin is a way better solution indeed!

ezavod commented 2 years ago

Thanks for pointing me to the MDA issue. I wasn't aware of this rather weird convention. If it's an accepted convention - as it seems to be - then I'm totally fine with doing this for impacted formats.

Luthaf commented 2 years ago

Let's keep this open to track similar changes for PDB & friends