aiidateam / aiida-core

The official repository for the AiiDA code
https://aiida-core.readthedocs.io
Other
436 stars 190 forks source link

Remove unnecessary methods from `Data` class #5193

Open chrisjsewell opened 3 years ago

chrisjsewell commented 3 years ago

Literally, none of this should be on the base Data node class:

https://github.com/aiidateam/aiida-core/blob/a80812c30ebb0de12f0429954508f9ceb6120d34/aiida/orm/nodes/data/data.py#L32-L39

https://github.com/aiidateam/aiida-core/blob/a80812c30ebb0de12f0429954508f9ceb6120d34/aiida/orm/nodes/data/data.py#L132-L359

  1. It pollutes every subclass with unnecessary bloat.
  2. The use of _get_object_*, _prepare_*, and _parse_* methods are completely undocumented. Given this, I strongly doubt any user or plugin really uses them.
  3. Really all of these should be external to the Data class, i.e. there should be a separation of concerns between data and logic

It is clear that all of these methods were only really added for the StructureData and some other materials related subclasses; in fact the docstring for convert is "Convert the AiiDA StructureData into another python object".

Taking each method separately, these are where they are used:

aiida/orm/nodes/data/structure.py:
  1106:     def _parse_xyz(self, inputstring):

aiida/orm/nodes/data/array/trajectory.py:
  504:     def _parse_xyz_pos(self, inputstring):
  553:     def _parse_xyz_vel(self, inputstring):

i.e. barely used

aiida/orm/nodes/data/cif.py:
  780:     def _get_object_ase(self):
  788:     def _get_object_pycifrw(self):

aiida/orm/nodes/data/structure.py:
  1787:     def _get_object_phonopyatoms(self):
  1802:     def _get_object_ase(self):
  1819:     def _get_object_pymatgen(self, **kwargs):
  1836:     def _get_object_pymatgen_structure(self, **kwargs):
  1902:     def _get_object_pymatgen_molecule(self, **kwargs):

For StructureData it is pretty clear that the whole design is too complex, given that for every one of these methods there is a duplicate public method, e.g.

def get_ase(self):
   return self._get_object_ase()
aiida/orm/nodes/data/cif.py:
  771:     def _prepare_cif(self, **kwargs):  # pylint: disable=unused-argument

aiida/orm/nodes/data/structure.py:
   979:     def _prepare_xsf(self, main_file_name=''):  # pylint: disable=unused-argument
  1001:     def _prepare_cif(self, main_file_name=''):  # pylint: disable=unused-argument
  1010:     def _prepare_chemdoodle(self, main_file_name=''):  # pylint: disable=unused-argument
  1077:     def _prepare_xyz(self, main_file_name=''):  # pylint: disable=unused-argument

aiida/orm/nodes/data/upf.py:
  430:     def _prepare_upf(self, main_file_name=''):
  480:     def _prepare_json(self, main_file_name=''):

aiida/orm/nodes/data/array/bands.py:
   543:     def _prepare_agr_batch(self, main_file_name='', comments=True, prettify_format=None):
   581:         raw_data, _ = self._prepare_dat_blocks(plot_info)
   625:     def _prepare_dat_multicolumn(self, main_file_name='', comments=True):  # pylint: disable=unused-argument
   649:     def _prepare_dat_blocks(self, main_file_name='', comments=True):  # pylint: disable=unused-argument
   807:     def _prepare_mpl_singlefile(self, *args, **kwargs):
   827:     def _prepare_mpl_withjson(self, main_file_name='', *args, **kwargs):  # pylint: disable=keyword-arg-before-vararg
   856:     def _prepare_mpl_pdf(self, main_file_name='', *args, **kwargs):  # pylint: disable=keyword-arg-before-vararg,unused-argument
   906:     def _prepare_mpl_png(self, main_file_name='', *args, **kwargs):  # pylint: disable=keyword-arg-before-vararg,unused-argument
   976:     def _prepare_gnuplot(
  1026:         raw_data, _ = self._prepare_dat_blocks(plot_info, comments=comments)
  1085:     def _prepare_agr(
  1232:     def _prepare_json(self, main_file_name='', comments=True):  # pylint: disable=unused-argument

aiida/orm/nodes/data/array/trajectory.py:
  417:     def _prepare_xsf(self, index=None, main_file_name=''):  # pylint: disable=unused-argument
  455:     def _prepare_cif(self, trajectory_index=None, main_file_name=''):  # pylint: disable=unused-argument

More use, but again could be much better designed.

Lastly, _export_format_replacements is not actually used in any subclass of aiida-core.

I would be happy to create a PR to remove all these methods, and re-write the code where necessary, but obviously I don't want to bother doing that without some tacit approval. Would they need to go through a deprecation cycle 🤷 (as noted, they are completely undocumented)

cc @sphuber @giovannipizzi @ltalirz

sphuber commented 3 years ago

I personally agree that the way that this was coded is not ideal and the concept of importers and exporters should be more "pluginnable". It is not just the base Data class that has code that is too tightly coupled, but also the StructureData suffers from code that directly builds in compatibility with ASE and pymatgen, for example, instead of making it easy to extend support to whatever library externally.

The reason I never touched this is because it is actually being used. Exactly due to the design of the code you highlight, it is difficult to detect though since the actual methods that are called are built "dynamically".

The use of _getobject, prepare, and parse* methods are completely undocumented. Given this, I strongly doubt any user or plugin really uses them.

At the very least these are exposed through the CLI, e.g., verdi data structure export and similar commands will call through to those. Same goes for importers. So I think this functionality is most likely being used and we would have to deprecate it once we find a better architecture to support adding export/import functionality of data types.

chrisjsewell commented 3 years ago

also the StructureData suffers from code that directly builds in compatibility with ASE and pymatgen

Yeh I know you were discussing this with Louis 👍