dataXdevelopment / SEELE

Data Extraction and Processing
Apache License 2.0
1 stars 0 forks source link

Testing web Scrapers #33

Open dca123 opened 2 years ago

dca123 commented 2 years ago

Need to test web scrapers to ensure that our scrapers aren't broken due to changes in the scraping sites design. Using metacritic as an example in the following.

If metacritic changes the carat sign for the next page, it will break our code. We need to be notified when this happens.

Proposed idea - Parse a few games from metacritic and store it as the desired result. Do automated testing to run current code and generate a new csv. Do a file diff comparison between the two. If the two files are the same, the code is running as expected.

This is an expensive slow process but it is important to know the health of our scrapers. Could also be exposed as an endpoint which notifies developers and consumers when a scraper is broken.

dca123 commented 2 years ago

@dayanadithyan could you create 3 files for the metacritic scraper as a proof of concept ?