HikkaV / Ukrainian-Reviews-Estimation

Cross-domain automative review score estimation and key phrases retrieval for Ukrainian.
1 stars 0 forks source link

[SA] Parsing tripadvisor #8

Closed HikkaV closed 1 year ago

HikkaV commented 2 years ago

For tripadvisor, no ukrainian comments are available. Nevertheless, the are lots of information there: 1) https://www.tripadvisor.ru/Hotels-g294473-Ukraine-Hotels.html - hotels 2) https://www.tripadvisor.ru/Attractions-g294473-Activities-a_allAttractions.true-Ukraine.html -attractions 3) https://www.tripadvisor.ru/Restaurants-g294473-Ukraine.html - restaurants We will need to then translate the data into ukranian.

HikkaV commented 2 years ago

Check if there is an option of using google translate on the flow while parsing.

HikkaV commented 2 years ago

Need to add the following: 1) Parse russian lang. 2) View all text of review button click. 3) Fix issue with pop up window. 4) Fix issue with whole translate. 5) Check if we can translate by click.

HikkaV commented 2 years ago
  1. Done
  2. Done
  3. Done
  4. Done

Should check point 4, as when parsing with google translate, page should be in focus for translation. Also adding other points:

  1. Get proxies.
  2. Parallel parsing.
HikkaV commented 2 years ago

New to do: 1) Add continual parsing of pages (if we already parsed some, information shouldn't be dropped, we should continue from that place). 2) Access denied omitting strategy implementation: use proxy rotator along with change of user. If for 10 times of user change we don't see change of situation -> change ip. 3) Rotation of woking ips in the other thread. A specific queue for it. 4) Possible sleep time and parsing in batches. Maybe different sleep times between parsing. 5) Different ip for different parallel parser.

HikkaV commented 2 years ago

Parsed all the reviews for hotels. Next to do: 1) Parse all the reviews for restaurants. 2) Parse all the reviews for attractions.