adamlwgriffiths / amazon_scraper

Provides content not accessible through the standard Amazon API
Other
234 stars 60 forks source link

how to properly iterate over reviews #15

Closed WestleyArgentum closed 8 years ago

WestleyArgentum commented 9 years ago

I'm trying to run the example block of code that looks like this:

p = amzn.lookup(ItemId='B0051QVF7A')
rs = amzn.reviews(URL=p.reviews_url)

for r in rs:
    print(r)

but at first I get an error like this:

Traceback (most recent call last):
  File "review-scraper/review-scraper.py", line 19, in <module>
    for r in rs:
  File "/usr/local/lib/python2.7/site-packages/amazon_scraper/reviews.py", line 178, in __iter__
    for id in page.ids:
  File "/usr/local/lib/python2.7/site-packages/amazon_scraper/reviews.py", line 202, in ids
    for anchor in self.soup.find_all('div', class_="a-section review")
  File "/usr/local/lib/python2.7/site-packages/amazon_scraper/__init__.py", line 113, in decorator
    raise e
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: html5lib. Do you need to install a parser library?

And when I install the html5lib package things work a little better (I'm able to print out the first page of reviews) but then I hit another error:

R1V8OBW4HRDV5W
R38AV3D6I8CHS6
R1R19OOAWIN48U
RL37IWIVVB5B4
R3S9D4LLRP7AQN
R1CAZXTXQ6F5A
R36R23EPPWW6UQ
RA751EK4W8EV4
RGZ3A10EDUYQ1
RP149JO3VJ31O
Traceback (most recent call last):
  File "review-scraper/review-scraper.py", line 19, in <module>
    for r in rs:
  File "/usr/local/lib/python2.7/site-packages/amazon_scraper/reviews.py", line 180, in __iter__
    page = Reviews(URL=page.next_page_url) if page.next_page_url else None
TypeError: __init__() takes at least 2 arguments (2 given)

Is there a different package I should be using?

adamlwgriffiths commented 9 years ago

Sorry, not sure why this wasn't picked up by tests. I've commit a fix in 78c4329d3427fce4e8113ca02ed59ec626241550

adamlwgriffiths commented 9 years ago

Bugs should be fixed in 6df73f6 The only issues I can see are the changes Amazon have made to ratings (#17).