Open richlysakowski opened 1 year ago
@richlysakowski -- I had the same problem. I think this one on Yelp is identical -- that's what I'm going to use. https://www.kaggle.com/datasets/ilhamfp31/yelp-review-dataset
@richlysakowski
Here's what worked for me running on Jupyter notebook (Google Colab, June 2023).
First, have ~/.kaggle/kaggle.json
with 600
permissions.
from pathlib import Path
creds = 'your JSON credentials from Kaggle.com'
cred_path = Path('~/.kaggle/kaggle.json').expanduser()
if not cred_path.exists():
cred_path.parent.mkdir(exist_ok=True)
cred_path.write_text(creds)
cred_path.chmod(0o600)
Then, download directly from Kaggle API:
from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api.authenticate()
dataset_slug = 'ilhamfp31/yelp-review-dataset'
api.dataset_download_files(dataset_slug, unzip=True)
You may have to rename a few files and folders:
mkdir data
mkdir data/yelp
mv yelp_review_polarity_csv/* data/yelp/
mv data/yelp/test.csv data/yelp/raw_test.csv
mv data/yelp/train.csv data/yelp/raw_train.csv
rm -r yelp_review_polarity_csv/
You should be able to run the rest of the Yelp notebooks as per normal.
raw_train.csv
https://drive.google.com/open?id=1xeUnqkhuzGGzZKThzPeXe2Vf6Uu_g_xM gives a 404 error
Please provide update link to exact dataset used in the book, or to an entirely new set of yelp CSV-formatted datasets (train, test, and reviews_with_splits_lite)