abjer / sds2019

Social Data Science 2019 - a summer school course
https://abjer.github.io/sds2019
46 stars 96 forks source link

8.2.3 - Scrape links from category pages on Trustpilot #28

Open Choptdei opened 5 years ago

Choptdei commented 5 years ago

Hi, I have written the following function, but I cant get it working. The lists 'reviews' and 'firmaer' is empty. The tages and classes should be right.

Can you help me?

`firmaer = [] reviews = []

def scraper(url): trin1 = requests.get(url) trin2 = BeautifulSoup(trin1.text, 'html.parser')

    firmaer.append(trin2.find_all('h3', {'class': 'category-business-card__header'}))
    temp_url = trin2.find_all('a', {'class': 'category-business-card card'})
    for review in temp_url:
        reviews.append(url + review['href'])

scraper('https://www.trustpilot.com/categories/social_club')`

kristianolesenlarsen commented 5 years ago

Maybe (probably) the HTML is dynamically generated in your browser, and thus requests wont receive html containing the correct elements. Have you looked for a json file containing the data you need?