UUDigitalHumanitieslab / Reader-responses-to-translated-literature

Scripts for the DIOPTRA-L project (Digital Opinions on Translated Literature)
MIT License
0 stars 0 forks source link

Actual review text repeats #11

Closed alexhebing closed 4 years ago

alexhebing commented 4 years ago

As noted by @BeritJanssen in #1, in some cases the review text repeats. This is because when the review is long, at first only a snippet is visible. Make sure to only extract the text once.

Also, ensure that, when dealing with a long review, to leave out the '...more' text (which is in a <a> tag).

alexhebing commented 4 years ago

@JosedeKruif : if you had already done a git pull origin master (i.e. for further testing of #10), please do so once more, so that whatever you scrape doesn't have the doubled texts.