Open LucianoBatista opened 2 years ago
Do you know if this was a change in scikit learn recently?
I don't know, currently, I'm using the newer (1.0.2) during the study of your the book: https://pypi.org/project/scikit-learn/
Changed in version 0.24: The default value of as_frame changed from False to 'auto' in 0.24.
Yup, they changed it. I'll fix this soon - thank you for the catch!
Changing the getitem function by adding .loc after the dataframe seems to solve the issue. Still working my way past this point but atleast I now got the same output as the book.
def getitem(self, index): inputs = torch.tensor(self.X.loc[index,:], dtype=torch.float32) targets = torch.tensor(int(self.y.loc[index]), dtype=torch.int64) return inputs, targets
I encountered this today and made a comment in the livebook from manning. I think this solution is better than what i suggested there.
If i change self.X = X.to_numpy() does that casuse problems down the road about being able to benefit from pytorch? Great book im really enjoying it.
I used dataset = SimpleDataset(X.to_numpy(), y.to_numpy())
and it solved the problem.
This could go into the __init__
function as @chuymtz suggested, but
if we want to make a dataset from a torch tensor, we would have to use
dataset = SimpleDataset(X.numpy(), y.numpy())
instead (not to_numpy
- argh).
XX = torch.tensor(X.to_numpy())
yy = torch.tensor(y.to_numpy(dtype=int), dtype=torch.int64)
dataset2 = SimpleDataset(XX.numpy(), yy.numpy())
a little tedious for me about murphyk's solution since it converts X to tensor and then back to numpy again.
'X, y = fetch_openml("mnist_784", version=1, return_X_y=True, as_frame=False)'
this solution is very good. thanks, LucianoBatista
another solution seems good, just fyi
X, y = fetch_openml("mnist_784", version=1, return_X_y=True) # no need to change here
class SimpleDataset(Dataset):
def __init__(self, X, y):
super(SimpleDataset, self).__init__()
self.X = X.values # get the numpy data via values
self.y = y.values # get the numpy data via values
dataset = SimpleDataset(X, y)
example, label = dataset[0]
@LucianoBatista: Thanks for your solution! I ran into the same problem. I created a pull request.
Hi, I was getting the followed erro when I executing this code:
The same was fixed when I change the code of the fetch_openml to:
The problem was that whithout the as_frame, scikit will import the data as a DataFrame, not as numpy anymore.