Datenschule / jedeschule-scraper

MIT License
22 stars 15 forks source link

Add diff script for PRs #51

Closed k-nut closed 3 years ago

k-nut commented 4 years ago

When a new PR comes in, we should check the effect that it has on the results. I've started sketching out some initial ideas on the diff-script branch.

I thought about it some more and propose that we do the following: 1) When a new PR comes in we use git to find the scrapers that were changed when comparing to master. 2) We do a small test run of the changed spider (using the -s "CLOSESPIDER_ITEMCOUNT=5" flag) and get the ids of the scraped items 3) We make a call to the (still to be created) api querying the old items and then run a diff.

If we are happy with the changes, we merge, if not, we make further required changes to the code.

All of this could probably be implemented using the new Github Actions feature.