adamlwgriffiths / amazon_scraper

Provides content not accessible through the standard Amazon API
Other
234 stars 60 forks source link

Review ids are not being parsed correctly #4

Closed hahnicity closed 9 years ago

hahnicity commented 9 years ago

with url: http://www.amazon.com/product-reviews/1449355730/ref=cm_cr_pr_top_sort_recent?&sortBy=bySubmissionDateDescending

I did:

from amazon_scraper import AmazonScraper
amzn = AmazonScraper(stuff...)
revs = amz.reviews(URL="http://www.amazon.com/product-reviews/1449355730/ref=cm_cr_pr_top_sort_recent?&sortBy=bySubmissionDateDescending")
revs.ids

I get an empty list. The cause might be that amazon changed their html? I'd like to make the change

     @property
     def ids(self):
         return [
-            extract_review_id(anchor['href'])
-            for anchor in self.soup.find_all('a', text=re.compile(ur'permalink', flags=re.I))
+            anchor["id"]
+            for anchor in self.soup.find_all('div', class_="a-section review")
         ]

This matches up a bit more closely with amazon html which looks like

<div id="R2UBSL6L1T8MIF" class="a-section review"><div class="a-row helpful-votes-count"></div>
...
adamlwgriffiths commented 9 years ago

You should put yourself in the README authors list =)