Closed HikkaV closed 1 year ago
Check if there is an option of using google translate on the flow while parsing.
Need to add the following: 1) Parse russian lang. 2) View all text of review button click. 3) Fix issue with pop up window. 4) Fix issue with whole translate. 5) Check if we can translate by click.
Should check point 4, as when parsing with google translate, page should be in focus for translation. Also adding other points:
New to do: 1) Add continual parsing of pages (if we already parsed some, information shouldn't be dropped, we should continue from that place). 2) Access denied omitting strategy implementation: use proxy rotator along with change of user. If for 10 times of user change we don't see change of situation -> change ip. 3) Rotation of woking ips in the other thread. A specific queue for it. 4) Possible sleep time and parsing in batches. Maybe different sleep times between parsing. 5) Different ip for different parallel parser.
Parsed all the reviews for hotels. Next to do: 1) Parse all the reviews for restaurants. 2) Parse all the reviews for attractions.
For tripadvisor, no ukrainian comments are available. Nevertheless, the are lots of information there: 1) https://www.tripadvisor.ru/Hotels-g294473-Ukraine-Hotels.html - hotels 2) https://www.tripadvisor.ru/Attractions-g294473-Activities-a_allAttractions.true-Ukraine.html -attractions 3) https://www.tripadvisor.ru/Restaurants-g294473-Ukraine.html - restaurants We will need to then translate the data into ukranian.