NCAS-CMS / cfunits

A Python interface to UNIDATA’s UDUNITS-2 library with CF extensions:
http://ncas-cms.github.io/cfunits
MIT License
11 stars 8 forks source link

Implement a `__dask_tokenization__` method #50

Closed davidhassell closed 1 year ago

davidhassell commented 1 year ago

It would be useful to implement a __dask_tokenization__ method so that the same units always give the same result when tokenized by dask (https://docs.dask.org/en/stable/custom-collections.html#deterministic-hashing). Currently this is not the case:

>>> # 3.3.5 behaviour:
>>> import Units
>>> from dask.base import tokenize
>>> tokenize(Units('m'))
'289d524be00c748d8d44054d4a89eb1b'
>>> tokenize(Units('m'))
'79dbd5ecc72a80de70999c9e4d7277e0'

But what we'd like is:

>>> # Desired behaviour:
>>> tokenize(Units('m'))
'723a7542d85f60cdfcb5ef1713b4e2c3'
>>> tokenize(Units('m'))
'723a7542d85f60cdfcb5ef1713b4e2c3'
sadielbartholomew commented 1 year ago

Good idea. I agree this would be nice. Reviewing the PR now.