amepproject / amep

The Active Matter Evaluation Package (AMEP) - a Python library for the analysis of particle-based and continuum simulation data of soft and active matter systems
https://amepproject.de/
GNU General Public License v3.0
11 stars 2 forks source link

BUG: h5amep format improvements - no zero-padding in h5amep file #41

Open kay-ro opened 3 months ago

kay-ro commented 3 months ago

Description:

At the moment, the LAMMPS reader reads dump files and and stores the data in the h5amep file. Data such as coordinates or velocities are stored as 3d data. Data that is missing (such as z in 2d simulations) is replaced by 0s. But this may not always be correct. If the simulation data is missing a component accidentally, this would lead to incorrect replacement of missing data.

This also applies to the current version of the AMEP HDF5 data format. It combines all vector quantities to a 3d dataset in the h5amep file, e.g., coordinates are stored as a (N,3) dataset named 'coords'. If for a 2d system for example, only x and y are given, the z component will automatically be set to zero. Additionally, the h5amep file will have datasets for all standard vector quantities initialized with zeros per default. Thus, even if for exameple forces are not given in the dump files, a dataset called "forces" exists that only contains zeros. It would be better, if it would not exist. Additionally, if the user wants to access this data, an error should be raised saying that the requested data is not availabe.

In conclusion, we should not initialize the HDF5 file with arrays of zeros. Instead, it would be better to store each quantity (i.e., each column of a dump file) in a seperate dataset (as already done for the scalar quantities and any user-defined quantities). Thus, instead of 'coords', the HDF5 file will have 3 datasets called 'x', 'y', and 'z' (if all of them are given in the dump file). If for example z is not given, there will be no dataset called 'z'.

If the user wants to access the data, e.g., if one would like to get the coordinate array in the shape (N,3) and for example z does not exist, AMEP should fill the last column of the array with zeros and print a warning (same for other vector quantities such as velocities, forces, ...).

Backwards compatibility can be ensured by modifying the __read_data method of the BaseFrame class (we need an additional if condition such that we will have two, one for the current format and one for the new format).

Code for reproduction:

traj = amep.load.traj("2d_data", mode="lammps")
coords = traj[-1].coords()
print(coords[:,2])

Error message:

Output should not be 0s. At least a warning is expected.

Python and AMEP versions:

any python version, AMEP 1.0.1

Additional information:

ToDo:

How did you install AMEP?

None