ageron / handson-ml2

A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.
Apache License 2.0
27.99k stars 12.8k forks source link

Chapter 2 - Download the Data #178

Open mzeman1 opened 4 years ago

mzeman1 commented 4 years ago

This code doesn't work for me:

import os
import tarfile
import urllib
DOWNLOAD_ROOT = "https://raw.githubusercontent.com/ageron/handson-ml2/master/"
HOUSING_PATH = os.path.join("datasets", "housing")
HOUSING_URL = DOWNLOAD_ROOT + "datasets/housing/housing.tgz"

def fetch_housing_data(housing_url=HOUSING_URL, housing_path=HOUSING_PATH):
    os.makedirs(housing_path, exist_ok=True)
    tgz_path = os.path.join(housing_path, "housing.tgz")
    urllib.request.urlretrieve(housing_url, tgz_path)
    housing_tgz = tarfile.open(tgz_path)
    housing_tgz.extractall(path=housing_path)
    housing_tgz.close() 
Cpauls35 commented 4 years ago

Same issue here, running on my Jetson nano. When I run this code i get a urllib request error. Importing urlib.request fixed that; however, even after calling the function I don't get a directory made and am currently investigating the path as that doesn't work either.

dgmorrow19 commented 4 years ago

did you call the function? (which is in the next cell) fetch_housing_data()

Cpauls35 commented 4 years ago

fetch_housing_data() called... Error output HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 404: Not Found

NVivek commented 4 years ago
from __future__ import division, print_function, unicode_literals

import numpy as np
import os
import pandas as pd
import tarfile
from six.moves import urllib

DOWNLOAD_ROOT = "https://raw.githubusercontent.com/ageron/handson-ml/master/"
HOUSING_PATH = os.path.join("datasets", "housing")
HOUSING_URL = DOWNLOAD_ROOT + "datasets/housing/housing.tgz"

def fetch_housing_data(housing_url=HOUSING_URL, housing_path=HOUSING_PATH):
    if not os.path.isdir(housing_path):
        os.makedirs(housing_path)
    tgz_path = os.path.join(housing_path, "housing.tgz")
    urllib.request.urlretrieve(housing_url, tgz_path)
    housing_tgz = tarfile.open(tgz_path)
    housing_tgz.extractall(path=housing_path)
    housing_tgz.close()
mzeman1 commented 4 years ago

I didn't. But now, it gave me this error:

URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1108)>

2807754 commented 4 years ago

Hello community, just started with this interesting book, but a problem came over with this following code:

%matplotlib inline import matplotlib.pyplot as plt housing.hist(bins=50, figsize=(20,15)) save_fig("attribute_histogram_plots") plt.show()

Once I deployed, it shows the following error:


AttributeError Traceback (most recent call last)

in ----> 1 get_ipython().run_line_magic('matplotlib', 'inline') 2 import matplotlib.pyplot as plt 3 housing.hist(bins=50, figsize=(20,15)) 4 save_fig("attribute_histogram_plots") 5 plt.show() /opt/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py in run_line_magic(self, magic_name, line, _stack_depth) 2305 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals 2306 with self.builtin_trap: -> 2307 result = fn(*args, **kwargs) 2308 return result 2309 in matplotlib(self, line) /opt/anaconda3/lib/python3.7/site-packages/IPython/core/magic.py in (f, *a, **k) 185 # but it's overkill for just that one bit of state. 186 def magic_deco(arg): --> 187 call = lambda f, *a, **k: f(*a, **k) 188 189 if callable(arg): /opt/anaconda3/lib/python3.7/site-packages/IPython/core/magics/pylab.py in matplotlib(self, line) 97 print("Available matplotlib backends: %s" % backends_list) 98 else: ---> 99 gui, backend = self.shell.enable_matplotlib(args.gui.lower() if isinstance(args.gui, str) else args.gui) 100 self._show_matplotlib_backend(args.gui, backend) 101 /opt/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py in enable_matplotlib(self, gui) 3405 gui, backend = pt.find_gui_and_backend(self.pylab_gui_select) 3406 -> 3407 pt.activate_matplotlib(backend) 3408 pt.configure_inline_support(self, backend) 3409 /opt/anaconda3/lib/python3.7/site-packages/IPython/core/pylabtools.py in activate_matplotlib(backend) 304 305 import matplotlib --> 306 matplotlib.interactive(True) 307 308 # Matplotlib had a bug where even switch_backend could not force AttributeError: module 'matplotlib' has no attribute 'interactive' Need help to solve this exercise. Many thanks
zkDreamer commented 4 years ago

I had the same error

ageron commented 3 years ago

Hi there,

@mzeman1 , you're running into a very common problem which is linked to the installation of Python on MacOSX. You need to install the SSL certificates. I explain how in the FAQ.

@Cpauls35 , getting an HTTP 404 error is weird. This means that the URL is invalid. The only explanation I can see is there's a typo in your code. Please make sure you're running exactly the same code as in the notebook. If it still doesn't work, please check your network settings, perhaps a firewall or proxy is messing things up. In any case, if you run the notebook in Colab, you will see that everything works fine.

@2807754 and @zkDreamer , this StackOverflow question seems to have an accepted answer that may fix your problem: in short, uninstall matplotlib and reinstall it.

Hope this helps.

AlejandorLazaro commented 2 years ago

@mzeman1, I just had this same error (On macOS Monterey 12.2.1 (21D62) on an M1 MacBook Air), and the following Github answer solved the problem for me.

https://github.com/Cadene/pretrained-models.pytorch/issues/193#issuecomment-635730515

I reworked the data fetching logic for Chapter 2 into the following, which worked on my machine:

def fetch_data(url, path, archive_name):
    # Workaround for https://github.com/Cadene/pretrained-models.pytorch/issues/193#issuecomment-635730515
    import ssl
    ssl._create_default_https_context = ssl._create_unverified_context

    os.makedirs(path, exist_ok=True)
    archive_path = os.path.join(path, archive_name)
    urllib.request.urlretrieve(url, archive_path)
    archive = tarfile.open(archive_path)
    archive.extractall(path)
    archive.close()
ageron commented 2 years ago

@AlejandorLazaro , please don't do this ! It deactivates all SSL verification, basically destroying all SSL security. It's not the right solution. Instead, please install the root certificates by opening a terminal and running the following command (change 3.10 to whatever Python version you are using):

/Applications/Python\ 3.10/Install\ Certificates.command

This will install the certifi bundle of root certificates and solve the problem without destroying all security.

If you installed Python using MacPorts, then run sudo port install curl-ca-bundle instead.

AlejandorLazaro commented 2 years ago

Whoops! Thanks for the response and correction there!