MDAnalysis / mdanalysis

MDAnalysis is a Python library to analyze molecular dynamics simulations.
https://mdanalysis.org
Other
1.32k stars 651 forks source link

Cannot load LAMMPS NetCDF trajectory files due to "header issue" in scipy.io.netcdf_file #4444

Open bbye98 opened 9 months ago

bbye98 commented 9 months ago

Expected behavior

I have run a LAMMPS simulation of a simple electrolyte–solvent systems confined between two planar dielectric electrodes. During the simulation, the topology is written to LAMMPS dump file cg_lj_coul_reduced_lammps.topology and the trajectory is written to a NetCDF file cg_lj_coul_reduced_lammps.nc.

I am attempting to load both files to a MDAnalysis.Universe object using

Python 3 code import os import MDAnalysis as mda os.chdir("/mnt/e/research/test/poisson_potential/data") fname = "cg_lj_coul_reduced_lammps" universe = mda.Universe(f"{fname}.topology", f"{fname}.nc", topology_format="DATA", atom_style="id type q x y z vx vy vz")

Actual behavior

Instead, I get the following ValueError stemming from a header check in scipy.io.netcdf_file:

Python 3 stack trace --------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[35], line 7 5 os.chdir("/mnt/e/research/test/poisson_potential/data") 6 fname = "cg_lj_coul_reduced_lammps" ----> 7 universe = mda.Universe(f"{fname}.topology", f"{fname}.nc", topology_format="DATA", 8 atom_style="id type q x y z vx vy vz") File ~/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/core/universe.py:375, in Universe.__init__(self, topology, all_coordinates, format, topology_format, transformations, guess_bonds, vdwradii, fudge_factor, lower_bound, in_memory, in_memory_step, *coordinates, **kwargs) 370 coordinates = _resolve_coordinates(self.filename, *coordinates, 371 format=format, 372 all_coordinates=all_coordinates) 374 if coordinates: --> 375 self.load_new(coordinates, format=format, in_memory=in_memory, 376 in_memory_step=in_memory_step, **kwargs) 378 if transformations: 379 if callable(transformations): File ~/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/core/universe.py:580, in Universe.load_new(self, filename, format, in_memory, in_memory_step, **kwargs) 577 # supply number of atoms for readers that cannot do it for themselves 578 kwargs['n_atoms'] = self.atoms.n_atoms --> 580 self.trajectory = reader(filename, format=format, **kwargs) 581 if self.trajectory.n_atoms != len(self.atoms): 582 raise ValueError("The topology and {form} trajectory files don't" 583 " have the same number of atoms!\n" 584 "Topology number of atoms {top_n_atoms}\n" (...) 588 fname=filename, 589 trj_n_atoms=self.trajectory.n_atoms)) File ~/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/lib/util.py:2553, in store_init_arguments..wrapper(self, *args, **kwargs) 2551 else: 2552 self._kwargs[key] = arg -> 2553 return func(self, *args, **kwargs) File ~/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/coordinates/TRJ.py:446, in NCDFReader.__init__(self, filename, n_atoms, mmap, **kwargs) 443 super(NCDFReader, self).__init__(filename, **kwargs) 445 # ensure maskandscale is off so we don't end up double scaling --> 446 self.trjfile = NCDFPicklable(self.filename, 447 mmap=self._mmap, 448 maskandscale=False) 450 # AMBER NetCDF files should always have a convention 451 try: File ~/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py:279, in netcdf_file.__init__(self, filename, mode, mmap, version, maskandscale) 276 self._attributes = {} 278 if mode in 'ra': --> 279 self._read() File ~/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py:610, in netcdf_file._read(self) 608 # Read file headers and set data. 609 self._read_numrecs() --> 610 self._read_dim_array() 611 self._read_gatt_array() 612 self._read_var_array() File ~/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py:620, in netcdf_file._read_dim_array(self) 618 header = self.fp.read(4) 619 if header not in [ZERO, NC_DIMENSION]: --> 620 raise ValueError("Unexpected header.") 621 count = self._unpack_int() 623 for dim in range(count): ValueError: Unexpected header.

I get the same error when I try loading the trajectory file directly using scipy.io.netcdf_file, but not using netCDF4.Dataset:

Python 3 code from netCDF4 import Dataset from scipy.io import netcdf_file print(Dataset(f"{fname}.nc", "r")) print(netcdf_file(f"{fname}.nc", "r"))
Python 3 output + stack trace root group (NETCDF3_64BIT_DATA data model, file format NETCDF3): Conventions: AMBER ConventionVersion: 1.0 program: LAMMPS programVersion: 3 Nov 2022 dimensions(sizes): frame(1101), atom(1576), cell_spatial(3), cell_angular(3), label(10), spatial(3) variables(dimensions): |S1 spatial(spatial), |S1 cell_spatial(spatial), |S1 cell_angular(spatial, label), float32 time(frame), float32 cell_origin(frame, cell_spatial), float32 cell_lengths(frame, cell_spatial), float32 cell_angles(frame, cell_angular), int32 id(frame, atom), int32 type(frame, atom), float32 q(frame, atom), float32 coordinates(frame, atom, spatial), int32 ix(frame, atom), int32 iy(frame, atom), int32 iz(frame, atom), float32 velocities(frame, atom, spatial), float32 forces(frame, atom, spatial) groups: --------------------------------------------------------------------------- --------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[39], line 5 2 from scipy.io import netcdf_file 4 print(Dataset(f"{fname}.nc", "r")) ----> 5 print(netcdf_file(f"{fname}.nc", "r")) File ~/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py:279, in netcdf_file.__init__(self, filename, mode, mmap, version, maskandscale) 276 self._attributes = {} 278 if mode in 'ra': --> 279 self._read() File ~/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py:610, in netcdf_file._read(self) 608 # Read file headers and set data. 609 self._read_numrecs() --> 610 self._read_dim_array() 611 self._read_gatt_array() 612 self._read_var_array() File ~/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py:620, in netcdf_file._read_dim_array(self) 618 header = self.fp.read(4) 619 if header not in [ZERO, NC_DIMENSION]: --> 620 raise ValueError("Unexpected header.") 621 count = self._unpack_int() 623 for dim in range(count): ValueError: Unexpected header.

Code to reproduce the behavior

This is the LAMMPS script used to generate the topology and trajectory files:

LAMMPS script # ==================================================================== # # PARAMETERS AND SETTINGS # # ==================================================================== # shell cd /mnt/e/research/test/poisson_potential/data log cg_lj_coul_reduced_lammps.log variable N equal 1000 variable T_reduced equal 1 variable mass_reduced equal 1 variable epsilon_reduced equal 1 variable sigma_reduced equal 1 variable x equal 0.5 variable N_ion equal v_x*v_N/2 variable q_reduced equal 13.625994230719545 variable varepsilon_r equal 78 variable L_x_reduced equal 8 variable L_y_reduced equal 7.794228634059947 variable L_z_reduced equal 20.046884346862004 variable sigma_ion_wall_reduced equal v_sigma_reduced/2 variable q_wall_reduced equal -0.19030322569642577 variable unit_cell_x_reduced equal v_sigma_ion_wall_reduced/2 variable unit_cell_y_reduced equal v_sigma_ion_wall_reduced*sqrt(3)/2 variable unit_cell_z_reduced equal v_sigma_ion_wall_reduced*sqrt(6)/3 variable every equal 1000 variable timesteps equal 1100000 variable rng1 equal 12798 variable rng2 equal 9990 variable rng3 equal 22263 variable rng4 equal 101763 # ==================================================================== # # SYSTEM AND PARTICLES # # ==================================================================== # boundary p p f atom_style charge region box block 0 ${L_x_reduced} 0 ${L_y_reduced} 0 ${L_z_reduced} units box create_box 5 box mass * ${mass_reduced} region electrolyte block 0 ${L_x_reduced} 0 ${L_y_reduced} ${sigma_ion_wall_reduced} $(v_L_z_reduced-v_sigma_ion_wall_reduced) units box create_atoms 1 random ${N_ion} ${rng1} electrolyte create_atoms 2 random ${N_ion} ${rng2} electrolyte create_atoms 3 random $(v_N-2*v_N_ion) ${rng3} electrolyte set type 1 charge ${q_reduced} set type 2 charge -${q_reduced} set type 3 charge 0 group electrolyte type 1 2 3 lattice hcp $((v_unit_cell_z_reduced/0.916486)^-3) region left_wall block 0 ${L_x_reduced} 0 ${L_y_reduced} 0 ${unit_cell_z_reduced} units box create_atoms 4 region left_wall lattice hcp $((v_unit_cell_z_reduced/0.916486)^-3) origin 0 0 0.552307414567 region right_wall block 0 ${L_x_reduced} 0 ${L_y_reduced} $(v_L_z_reduced-v_unit_cell_z_reduced) ${L_z_reduced} units box create_atoms 5 region right_wall set type 4 charge 0 set type 5 charge 0 group electrodes type 4 5 # ==================================================================== # # FORCE FIELD # # ==================================================================== # dielectric ${varepsilon_r} kspace_style pppm 1e-4 kspace_modify slab 3 pair_style lj/cut/coul/long $(2^(1/6)*v_sigma_reduced) pair_coeff *3 *3 ${epsilon_reduced} ${sigma_reduced} pair_coeff 4* 4* 0 0 pair_coeff *3 4* $(100*v_epsilon_reduced) ${sigma_ion_wall_reduced} $(2^(1/6)*v_sigma_ion_wall_reduced) pair_modify shift yes velocity electrolyte create ${T_reduced} ${rng4} velocity electrodes set 0 0 0 fix nvt electrolyte nvt temp ${T_reduced} ${T_reduced} $(100*dt) fix electrodes electrodes setforce 0 0 0 # ==================================================================== # # ENERGY MINIMIZATION # # ==================================================================== # minimize 1e-6 1e-6 1000 1000 reset_timestep 0 set type 4 charge ${q_wall_reduced} set type 5 charge $(-v_q_wall_reduced) # ==================================================================== # # SIMULATION # # ==================================================================== # write_data cg_lj_coul_reduced_lammps.topology dump dump all netcdf ${every} cg_lj_coul_reduced_lammps.nc id type q x y z ix iy iz vx vy vz fx fy fz thermo ${every} thermo_style custom step spcpu cpuremain temp press ke evdwl ecoul elong run ${timesteps}

I do not believe this error is related to the NetCDF format, as the NetCDFFile/NetCDFReporter I wrote for OpenMM also uses the same format (NETCDF3_64BIT_DATA), but is recognized correctly by both netCDF4.Dataset and scipy.io.netcdf_file, and thus MDAnalysis.

Current version of MDAnalysis

IAlibay commented 8 months ago

Thanks for raising this issue @bbye98, unfortunately we've not had time to look into this yet.

Is there any way you could share a small version of a non-working netcdf file? I personally don't readily have access to (or the knowledge to use) LAMMPS and wouldn't be able to reproduce the file easily.

bbye98 commented 8 months ago

@IAlibay Apologies for the late response! I've attached a sample topology and trajectory file below (from a slab LJ-Coulomb system):

example.zip

Some information on package versions:

❯ lmp
LAMMPS (21 Nov 2023)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
  using 1 OpenMP thread(s) per MPI task

❯ python -c "import MDAnalysis as mda; print(mda.__version__)"
2.7.0

❯ python -c "import scipy; print(scipy.__version__)"
1.12.0

Reading the LAMMPS trajectory file using MDAnalysis.Universe:

>>> u = mda.Universe("example.nc")
Traceback (most recent call last):
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/topology/MinimalParser.py", line 64, in parse
    n_atoms = kwargs['n_atoms']
KeyError: 'n_atoms'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/core/universe.py", line 110, in _topology_from_file_like
    topology = p.parse(**kwargs)
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/topology/MinimalParser.py", line 67, in parse
    n_atoms = reader.parse_n_atoms(self.filename, **kwargs)
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/coordinates/TRJ.py", line 621, in parse_n_atoms
    with scipy.io.netcdf_file(filename, mmap=None) as f:
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py", line 279, in __init__
    self._read()
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py", line 610, in _read
    self._read_dim_array()
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py", line 620, in _read_dim_array
    raise ValueError("Unexpected header.")
ValueError: Unexpected header.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/core/universe.py", line 356, in __init__
    topology = _topology_from_file_like(self.filename,
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/core/universe.py", line 125, in _topology_from_file_like
    raise ValueError(
ValueError: Failed to construct topology from file example.nc with parser <class 'MDAnalysis.topology.MinimalParser.MinimalParser'>.
Error: Unexpected header.

Reading the LAMMPS trajectory file using scipy.io.netcdf_file:

>>> u = mda.Universe("example.nc")
Traceback (most recent call last):
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/topology/MinimalParser.py", line 64, in parse
    n_atoms = kwargs['n_atoms']
KeyError: 'n_atoms'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/core/universe.py", line 110, in _topology_from_file_like
    topology = p.parse(**kwargs)
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/topology/MinimalParser.py", line 67, in parse
    n_atoms = reader.parse_n_atoms(self.filename, **kwargs)
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/coordinates/TRJ.py", line 621, in parse_n_atoms
    with scipy.io.netcdf_file(filename, mmap=None) as f:
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py", line 279, in __init__
    self._read()
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py", line 610, in _read
    self._read_dim_array()
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py", line 620, in _read_dim_array
    raise ValueError("Unexpected header.")
ValueError: Unexpected header.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/core/universe.py", line 356, in __init__
    topology = _topology_from_file_like(self.filename,
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/core/universe.py", line 125, in _topology_from_file_like
    raise ValueError(
ValueError: Failed to construct topology from file example.nc with parser <class 'MDAnalysis.topology.MinimalParser.MinimalParser'>.
Error: Unexpected header.
>>> from scipy.io import netcdf_file
>>> nc = netcdf_file("example.nc", "r")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py", line 279, in __init__
    self._read()
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py", line 610, in _read
    self._read_dim_array()
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py", line 620, in _read_dim_array
    raise ValueError("Unexpected header.")
ValueError: Unexpected header.

Reading the LAMMPS trajectory file using netCDF4.Dataset:

>>> import netCDF4 as nc
>>> nc_ = nc.Dataset("example.nc")
>>> nc_
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF3_64BIT_DATA data model, file format NETCDF3):
    Conventions: AMBER
    ConventionVersion: 1.0
    program: LAMMPS
    programVersion: 21 Nov 2023
    dimensions(sizes): frame(11), atom(1576), cell_spatial(3), cell_angular(3), label(10), spatial(3)
    variables(dimensions): |S1 spatial(spatial), |S1 cell_spatial(spatial), |S1 cell_angular(spatial, label), float32 time(frame), float32 cell_origin(frame, cell_spatial), float32 cell_lengths(frame, cell_spatial), float32 cell_angles(frame, cell_angular), int32 id(frame, atom), int32 type(frame, atom), float32 q(frame, atom), float32 coordinates(frame, atom, spatial), int32 ix(frame, atom), int32 iy(frame, atom), int32 iz(frame, atom), float32 velocities(frame, atom, spatial), float32 forces(frame, atom, spatial)
    groups:

For what it's worth, I recently dropped support for scipy.io.netcdf_file altogether in favor of netCDF4.Dataset in my workflow because the former had bizarre behavior when used as an OpenMM trajectory writer, such as randomly writing the first frame of the simulation three times--once at the beginning and twice at the end of the trajectory file--despite correctly flushing/syncing and closing the NetCDF file at the end of the simulation. The SciPy implementation also doesn't report the length of an "unlimited" dimension (like time), unlike netCDF4.

Luthaf commented 7 months ago

This file is a bit strange: it starts with CDF5 instead of the usual and expected CDF2. Reading around, CDF2 seems to indicate 64-bit offsets (which are required by the format: https://ambermd.org/netcdf/nctraj.xhtml), and CDF5 seems to indicate 64-bit data and offsets (which would also moves the header around, explaining the error from scipy).

The code for this seems to have been around for a while in the netcdf-c library, not sure what caused this change in the output of LAMMPS. Maybe something about the specific version of netcdf LAMMPS is using?

orbeckst commented 7 months ago

@bbye98 did you find out if anything changed in the LAMMPS netcdf output?

bbye98 commented 7 months ago

Hi! Apologies for the late response. I'm not sure which version of NetCDF LAMMPS is using, but I can try installing an older version of LAMMPS (say, from 2021) to test.