ageron / handson-ml

⛔️ DEPRECATED – See https://github.com/ageron/handson-ml3 instead.
Apache License 2.0
25.12k stars 12.91k forks source link

Why does saving the test set not work? #688

Closed zion-chuu closed 7 months ago

zion-chuu commented 7 months ago

Hi author, I'm reading your book, 3rd edition, and this thought came to me in chapter 2:

One solution is to save the test set on the first run and then load it in subsequent runs. Another option is to set the random number generator’s seed (e.g., with np.random.seed(42))6 before calling np.random.permutation() so that it always generates the same shuffled indices.
However, both these solutions will break the next time you fetch an updated dataset.

You can use the hash value to determine whether the original dataset has been updated, and if it has been updated, you can run the training test separation function again in the same way and update the previously saved test set.