bigscience-workshop / catalogue_data

Scripts to prepare catalogue data
Apache License 2.0
8 stars 1 forks source link

Use MD5 to obtain persistent hash #25

Closed thomasw21 closed 2 years ago

thomasw21 commented 2 years ago

We've observed that hash is not consistent across process leading to some duplicates to not have the same hash. This PR resolves this.