Refactor Volume class to support multiple data types

GEMDAT-repos / GEMDAT

Python toolkit for molecular dynamics analysis

https://gemdat.readthedocs.io

Apache License 2.0

21 stars 3 forks source link

Refactor Volume class to support multiple data types #282

Closed stefsmeets closed 3 months ago

stefsmeets commented 4 months ago

This PR updates the Volume class to store multiple data volumes, mimicking pymatgen's VolumetricData. This makes some things a bit easier, like grouping similar data types and being able to write all data volumes to a vesta file in one go. The downside is that it needs a bit more management of the data keys (e.g. for plotting). Maybe that is something that can be simplified later.

I considered using VolumetricData as a base class, but it requires a structure, which we don't necessarily have.

Closes #281

TODO

[x] Remove .resolution / calculate it on the fly
[x] voxel_size is redundant with resolution
[x] Implement function so save to vesta data

stefsmeets commented 4 months ago

Hi @SCiarella , could you have a look at this PR and let me know what you think?

stefsmeets commented 4 months ago

Actually, I'm not sure if I like this implementation.

I think I'd maybe rather go for something simpler.

@dataclass
class Volume:
    data: dict[str, np.ndarray]
    lattice: Lattice
    label: str
    units: Optional[...]

A = Volume(data=..., lattice=...)
B = Volume(data=..., lattice=...)

A.to_vasp_volume(structure=structure, other={'vol_b':B})

An additional requirement is to then move find_peaks() out of find_best_perc_path() to avoid 2 different data types.

stefsmeets commented 3 months ago

Hi @SCiarella Could you have another look? I ended up revising the Volume class to solve a bunch of outstanding issues. Some of it affects the paths module, but I think the result is cleaner overall.

stefsmeets commented 3 months ago

Hi @SCiarella , did you have a chance to look at this yet?

stefsmeets commented 3 months ago

Thanks! Yeah, that would be useful. The issue I ran into is that this would require passing both the volume and the free energy, and I wanted to not have to provide both here. I think passing peaks makes it more clear what is happening and allows users to pass their own set of peaks.

We could default to finding the peaks (when peaks==None) from the free energy. That requires a modification to the peak finding algorithm to find local minima.