Type-related issues in pset hashes

joblib.hash may be too specific for our purposes in some cases, since it is type-sensitive:

# Python int
>>> ps.pset_hash(dict(a=1))
'64846e128be5c974d6194f77557d0511542835a8'
>>> ps.pset_hash(dict(a=int(1)))
'64846e128be5c974d6194f77557d0511542835a8'

# np.int64
>>> ps.pset_hash(dict(a=np.int64(1)))
'4bbb1de2b27b9cfd2f81aa37df3bb3926b2d584d'
>>> ps.pset_hash(dict(a=np.array([1])[0]))
'4bbb1de2b27b9cfd2f81aa37df3bb3926b2d584d'

In the context of a pset, we wouldn't care what the type is, as long as it is some kind of int. But the type sensitivity can cause problems if we read back params from a database, e.g. when repeating workloads for failed psets.

If we pass in ints as in

>>> params = ps.plist("a", [1,2,3])

pandas will cast them such that in a DataFrame, df.a.values will be a numpy array

>>> df.a.values
array([1, 2, 3, 4])
>>> df.a.values.dtype
dtype('int64')

with each entry being int64, but to_dict() in

>>> strip_pset = lambda pset: {k: v for k,v in pset.items() if not k.startswith("_")}
>>> params_from_df = [strip_pset(row.to_dict()) for _, row in df.iterrows()]
>>> type(params_from_df[0]["a"])
int

will cast back to Python ints.

elcorto / psweep

Type-related issues in pset hashes #22