A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.
Apache License 2.0
7.84k
stars
3.14k
forks
source link
[BUG] Chapter 2: Section "Download the data", buggy implementation for load_housing_data() function #156
The implementation for load_housing_data() is as following:
def load_housing_data():
tarball_path = Path("datasets/housing.tgz")
if not tarball_path.is_file():
Path("datasets").mkdir(parents=True, exist_ok=True)
url = "https://github.com/ageron/data/raw/main/housing.tgz"
urllib.request.urlretrieve(url, tarball_path)
with tarfile.open(tarball_path) as housing_tarball:
housing_tarball.extractall(path="datasets")
return pd.read_csv(Path("datasets/housing/housing.csv"))
Based on this implementation if the file datasets/housing.tgz exists, it just reads the datasets/housing/housing.csv and returns. It may be a case that datasets/housing.tgz exists but datasets/housing/housing.csv dosen't. Therefor the code will run to FileNotFoundError. The correct implementation should be like this:
def load_housing_data():
tarfile_path = Path(f'datasets/housing.tgz')
if not tarfile_path.is_file():
Path.mkdir(Path('datasets'), parents=True, exist_ok=True)
response = requests.get('https://github.com/ageron/data/raw/main/housing.tgz')
with open(tarfile_path, 'wb') as f:
f.write(response.content)
with tarfile.open(tarfile_path) as housing_tarball:
housing_tarball.extractall(path="datasets")
return pd.read_csv(Path("datasets/housing/housing.csv"))
If datasets/housing.tgz exists, extract and then read it. If it dosen't, download it, extract it and then read it.
The implementation for load_housing_data() is as following:
Based on this implementation if the file
datasets/housing.tgz
exists, it just reads thedatasets/housing/housing.csv
and returns. It may be a case thatdatasets/housing.tgz
exists butdatasets/housing/housing.csv
dosen't. Therefor the code will run toFileNotFoundError
. The correct implementation should be like this:If
datasets/housing.tgz
exists, extract and then read it. If it dosen't, download it, extract it and then read it.