When a new PR comes in, we should check the effect that it has on the results.
I've started sketching out some initial ideas on the diff-script branch.
I thought about it some more and propose that we do the following:
1) When a new PR comes in we use git to find the scrapers that were changed when comparing to master.
2) We do a small test run of the changed spider (using the -s "CLOSESPIDER_ITEMCOUNT=5" flag) and get the ids of the scraped items
3) We make a call to the (still to be created) api querying the old items and then run a diff.
If we are happy with the changes, we merge, if not, we make further required changes to the code.
All of this could probably be implemented using the new Github Actions feature.
When a new PR comes in, we should check the effect that it has on the results. I've started sketching out some initial ideas on the diff-script branch.
I thought about it some more and propose that we do the following: 1) When a new PR comes in we use git to find the scrapers that were changed when comparing to master. 2) We do a small test run of the changed spider (using the
-s "CLOSESPIDER_ITEMCOUNT=5"
flag) and get the ids of the scraped items 3) We make a call to the (still to be created) api querying the old items and then run a diff.If we are happy with the changes, we merge, if not, we make further required changes to the code.
All of this could probably be implemented using the new Github Actions feature.