delip / PyTorchNLPBook

Code and data accompanying Natural Language Processing with PyTorch published by O'Reilly Media https://amzn.to/3JUgR2L
Apache License 2.0
1.98k stars 807 forks source link

Chapter 03 Yelp Dataset has a Typo #30

Open amancioandre opened 4 years ago

amancioandre commented 4 years ago

Hi everyone,

Chapter 3 does not load Yelp data due to a typo on the last line of the dataset:

Line Review 73357: "1","Capital City Transfer han

Using nrows argument passing the number of rows - 1, fixed for me.

train_reviews = pd.read_csv(args.raw_train_dataset_csv, header=None, names = ['rating', 'review'], nrows=73356)

Or

train_reviews = pd.read_csv(args.raw_train_dataset_csv, header=None, names = ['rating', 'review'], error_bad_lines=False)

Or by just appending a " at this line.

Still, would be nice to fix this typo on the dataset.