acwooding / ReproAllTheThings

Showcase for the Easydata framework (and regular test case) created by reproducing the EmbedAllTheThings notebooks by jc-healy.
Other
1 stars 2 forks source link

Hash mismatch for wine_reviews_130k_varietals_75 #40

Open hamelin opened 3 years ago

hamelin commented 3 years ago

In notebook 03, when loading dataset wine_reviews_130k_varietals_75, I get a data hash mismatch. The intended SHA-1 is 52ea2825926ce21c8641109acdd6f889587d9c36, but the SHA-1 that is computed is 8b234d7595929d589c1a6781730fcb5b75e351e2.

It is easy to monkeypatch to work around, but I wonder if this plays along the same lines as the other hash mismatch issues previously encountered.

hamelin commented 3 years ago

Platform is Windows 10, 64-bit CPU.

hamelin commented 3 years ago

Debug dump is attached.

acwooding commented 3 years ago

Thanks. Similar to the hash in issue #28. Not sure yet if it's another version of the pandas+joblib hash issue or if additionally, we're missing a "sort".

hackalog commented 2 years ago

Easydata issue: https://github.com/hackalog/easydata/issues/231 To reproduce, try pinning pandas version == 1.0.5 and version == 1.3.2