HunterMcGushion / hyperparameter_hunter

Easy hyperparameter optimization and automatic result saving across machine learning algorithms and libraries
MIT License
704 stars 100 forks source link

Dataset hashes change between Pandas versions #166

Closed HunterMcGushion closed 5 years ago

HunterMcGushion commented 5 years ago

Problem

Battle-Plan

Options for New DataFrame Clause

return hashlib.sha256(
    pd.util.hash_pandas_object(obj, index=True).values
).hexdigest()

or something like

return obj.to_csv().encode("utf-8")

Both produce consistent values for datasets for Pandas 0.24.2 and 0.25.0. However, the first feels safer, whereas the second is easier to understand and follows some representation of the object, rather than an actual hash, which is the intended purpose of to_hashable