aiidateam / aiida-pseudo

MIT License
5 stars 8 forks source link

Hashing made with element and md5 checksum #168

Closed unkcpz closed 7 months ago

unkcpz commented 7 months ago

fixes #167

For pseudo data, all we care about for the node is the hash of the file. So we could even think of overriding the _get_objects_to_hash and just return the md5 and element.

unkcpz commented 7 months ago

Somehow the objcets are the same but the hash values are different? I think I miss some key behaviour of make_hash function.

sphuber commented 7 months ago

It is not a misunderstanding of the make_hash function but rather that Node.get_hash does not call Node._get_objects_to_hash() but Node.base.caching._get_objects_to_hash. You should override the Node._CLS_NODE_CACHING attribute with a custom version of NodeCaching, i.e., something like:

from aiida.orm.nodes.caching import NodeCaching

class PseudoPotentialDataCaching(NodeCaching):
    """Class to define caching behavior of ``PseudoPotentialData`` nodes."""

    def _get_objects_to_hash(self) -> list:
        """Return a list of objects which should be included in the node hash."""
        return [self.element, self.md5]

class PseudoPotentialData(Data):

    _CLS_NODE_CACHING = PseudoPotentialDataCaching

That should make the tests pass. You have to change the original._get_objects_to_hash to original.base.caching._get_objects_to_hash of course.

unkcpz commented 7 months ago

@sphuber It works! thanks!

sphuber commented 7 months ago

Last thing I was wondering @unkcpz . Why do we need to include the element? Should the file content not really just be sufficient? Could it really be possible where you have two identical files but they should be for different elements?

Pinging also @mbercx for his thoughts

unkcpz commented 7 months ago

Could it really be possible where you have two identical files but they should be for different elements?

Because I saw the setter function for the element and then I was thinking that there might be different site labels for the same element using the same UPF file. I think it all depends on whether we assume user will change the element attribute or not. Afterall, the element is parsed from file, the same md5 will guarantee the same element.

sphuber commented 7 months ago

Afterall, the element is parsed from file, the same md5 will guarantee the same element.

For the subclasses it is yes, but for the PseudoPotentialData it has to be set manually. But regardless on how it is set, if the file content is identical for two different pseudos, how could it possibly for two differente elements? Functionally, there would be zero difference, so would it be necessary to even include the element attribute?

unkcpz commented 7 months ago

I don't have any use case involved with subfix for element, so I am leaning to remove element from caching. Is there use case in aiida-quantumespresso that may set for instance Fe to Fe_1 and Fe_2, even so that should have nothing to do with the aiida-pseudo I guess?