aiidateam / aiida-core

The official repository for the AiiDA code
https://aiida-core.readthedocs.io
Other
436 stars 191 forks source link

New AiiDA GaussianCubeData class. #4329

Open yakutovicha opened 4 years ago

yakutovicha commented 4 years ago

Gaussian cube file is probably the most common format to represent the 3D mesh data in computational molecular science. Many codes are capable of writing/reading the cube files, therefore it makes sense to provide support for the file format natively in AiiDA.

API

I propose the following code API:

GaussianCubeData(ArrayData)  # The class should be derived from the ArrayData
    def __init__():
        self.comment  # The contents extracted from the comment lines.
        self.origin  # Position of the origin of the volumeric data (bohr)
        self.voxel  # 3 vectors defining a voxel  (bohr)
        self.atomic_numbers  # Atomic nubers.
        self.atomic_coordinates  # Atomic coordinates.
        self.data  #  3D array containing the data.
        self.data_units  # Can be defined by the parser.
    def read(file)  # Read the cube file.
    def write(file)  # Write the cube file.
    def get_structure_data()  # Returns unstored AiiDA StructureData object extracted from the cube file.
    def crop_data(threshold_min=None, threshold_max=None)  # Crop the mesh according to the threshold value, adap the position of the structure.
    def clip_data(absmin=None, absmax=None)  # Set values lower than absmin to zero, set values higher than absmax to absmax
    def import_from_...()  # Import data from another file formats. For instance XSF

Current situation

For the moment there is no standard way to store the mesh data in AiiDA. Possible alternatives are to either use ArrayData object or to use a FileData object. The problem with those two approaches that they are not standardized and, therefore, do not always come with a clear way of using them.

Open questions.

csadorf commented 4 years ago

I think it should say get_structure_data(), because the corresponding class is called StructureData.

sphuber commented 4 years ago

Suggestion for the constructor to not only allow constructing it from file, but also by passing manually the attributes that define it:

class CubeData(Data):

    @classmethod
    def from_file(cls, filepath):
        # Parse the file content that returns comment, origin, voxel etc. in normal python structure
        kwargs = parse_cube(filepath)
        instance = cls.__new__(cls)
        instance.__init__(**kwargs)
        return instance

    def __init__(self, comment: str, origin, voxel, atomic_numbers, ....):
         """Construct new instance from provided data."""
         # Here you validate the input data making sure all required is defined and of correct type
giovannipizzi commented 4 years ago

@csadorf yes, but we should also check what is currently the name in TrajectoryData and try to use the same. In case we can change in both and deprecate the old one, to be removed in 2.0.

hdsassnick commented 1 year ago

Hello,

for one of my latest projects it would be great to be able to store cube-files within the aiida infrastructure.

I was wondering whether there are any further plans or ideas on how and where to implement this feature. I would also be happy to help/contribute to it.

Thanks and all the best, Holger

sphuber commented 1 year ago

Hi Holger, if you want you can already store a cube file. The most simple and probably most logical choice is the SinglefileData class. For example:

from aiida.orm import SinglefileData
filepath = '/some/path/data.cube'
cube_node = SinglefileData(filepath).store()
print(cube_node.get_content())

What we were discussing here is simply a more dedicated Data plugin that would provide additional convenience methods and store some data in the node's attributes so that it becomes queryable. But this is not strictly necessary to store a cube file.

hdsassnick commented 1 year ago

Hi Sebastiaan,

thank you for the quick response and the straight-forward solution.

My main concern would be the file-size which could be in some cases quite large but (I think) this was not part of this issue/PR. In any case I will then proceed with the SinglefileData implementation and customize things if it is needed for my use case.

All the best, Holger