damian0604 / bdaca

Course Materials Big Data and Automated Content Analysis
69 stars 22 forks source link

alternative for /text() in XPATH #5

Closed damian0604 closed 7 years ago

damian0604 commented 8 years ago

When there is a line/paragraph page within the results of an XPATH, the /text() function might not function properly, as it sees each part as a seperate element. Fix: leave away the /text() in the xpath itself and use the .text_content() method later on:

reviews = tree.xpath('//div/div/div[2]/div[*]/div[2]/p[1]')
print (len(reviews),"reviews scraped. Showing the first 60 characters of each:")
i=0
for review in reviews:
    print("Review",i,":",review.text_content())
    i+=1

Add this as alternative solution to XPATH-chapter

damian0604 commented 7 years ago

added