ageron / handson-ml2

A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.
Apache License 2.0
27.88k stars 12.75k forks source link

[Chapter 3] Pandas datatype #352

Closed atropos112 closed 3 years ago

atropos112 commented 3 years ago

In chapter 3 on page 86 (also in the github repo MNIST), one has code

from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1)
X, y = mnist["data"], mnist["target"]

This works well but the next step, namely some_digit = X[0] Throws an error

>>> some_digit = X[0]
Traceback (most recent call last):
  File "/home/<user>/.local/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 0

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/<user>/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 3024, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/<user>/.local/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3082, in get_loc
    raise KeyError(key) from err
KeyError: 0

Having up to date python, and the packages involved I was super confused why this is the case. I realised i can remedy this by instead setting

some_digit = X.values[0,:]

This turns it into array sure, I have to reshape it right after anyway so its fine. The reason I am writing this is because i then executed the same code that failed for me on google colab and there it just works, any idea what could be causing this ?

Thank you

mbreemhaar commented 3 years ago

I had the same problem. I found out that it depends on the version of scikit-learn. fetch_openml has an optional argument as_frame which used to default to False, making the function return a Numpy array. However, since scikit-learn version 0.24, the default value for as_frame is 'auto'.

(Source)

atropos112 commented 3 years ago

I had the same problem. I found out that it depends on the version of scikit-learn. fetch_openml has an optional argument as_frame which used to default to False, making the function return a Numpy array. However, since scikit-learn version 0.24, the default value for as_frame is 'auto'.

(Source)

Yes ! You got it, thank you !!