djay / covidthailand

Thailand Covid testing and case data gathered and combined from various sources for others to download or view
126 stars 15 forks source link

add in the remaining regression tests #48

Closed djay closed 3 years ago

djay commented 3 years ago

A test framework has been put in place in https://github.com/djay/covidthailand/blob/main/tests/test_scraping.py#L85

Todo [ ] - identify the parsers which break most often e.g. briefings [ ] - rework the get_links code to into seperate function and actual download as enclosure to enable test to only the files it needs [ ] - identify from comments or commits which dates required changes. generate a json file for each

reduxionist commented 3 years ago

Hi,

Do you want help on this issue? Any starting advice if so?

Thanks, Jonathan

djay commented 3 years ago

that would be great. I've slowly been adding them in as needed. Maybe start with https://github.com/djay/covidthailand/blob/main/tests/test_scraping.py#L213 if you look at the history of that routine you can see the likely key dates that led to changes. those are the ones that need tests. https://github.com/djay/covidthailand/blame/45ab729d5cdba862de2c5940264f790a5504907a/covid_data.py#L1515 There are details on how to add a test here - https://github.com/djay/covidthailand/blob/main/README.md#running-this-code

if those works out then maybe you could have a look at https://github.com/djay/covidthailand/pull/62 or https://github.com/djay/covidthailand/blob/main/tests/test_scraping.py#L102 (which seems to be missing data from some past dates)

reduxionist commented 3 years ago

Did I correctly identify 5d054122 "parse vac in briefing" from 2021-06-25 as a commit that introduced a (small) change to briefing_case_types that should be tested for?

If so, was adding https://github.com/intelligent-bytes/covidthailand/blob/jbarratt/issue_48-add_remaining_regression_tests/tests/briefing_case_types/2021-06-25.json all I needed to do?

No PR yet because I will follow your commit message style once I've got more of the needed changes covered and thus more json to submit...

Thanks!

reduxionist commented 3 years ago

I'm finding starting with the coverage report then going to git blame to get the date I need to add test data for to be the most efficient workflow for me personally.

One thing I don't understand from the README section is "If the results are correct there is commented out code in the test to export the data to the test json file" so, for test_briefing_case_types at least, I've just been adjusting a template JSON file to match the results I've seen captured in the pytest output (and verified against the source PDF). Hope that sounds like the right thing to you! ;)

About to issue PR for your review to see if I'm approaching this as intended or not...

uhhh... the PR I linked is only a partial fix for this issue of course... :wink:

reduxionist commented 3 years ago

Some sample dates of commits that changed code 4/13: 70b6c54 and 5/18: 8998d90 (huh, just noticed my git short refs are first 8 of the sha, but GH uses first 7 of the sha, 7 is what I remember as the short ref; have to comb through my 274 line global git config file once again :laughing:). While I think that on those dates you must have noticed something about the scrape going wrong and fixed it, because the code paths in question are now executed by tests run by test data for other dates I don't want to create a ton of (effectively) duplicate tests given duration of existing test suite run is already lengthy vs. code size. (And if I do need to do so, would you suggest I gather test data for day of commit or the day before? With first one I added I tried both, but both days failed and then passed in the same manner (as per "adding ((regression)) tests" doc instructions" so I stuck with data of commit.)

Apologies for all the questions; I have to admit the pandas project I did for work was a one-off that I couldn't justify budgeting data tests for, so this is completely new to me... :rofl:

djay commented 3 years ago

One thing I don't understand from the README section is "If the results are correct there is commented out code in the test to export the data to the test json file" so, for test_briefing_case_types at least, I've just been adjusting a template JSON file to match the results I've seen captured in the pytest output (and verified against the source PDF). Hope that sounds like the right thing to you! ;)

An example is https://github.com/djay/covidthailand/blob/2a37ca45797db28c3a7548d5201ca13fb5370ea5/tests/test_scraping.py#L190-L190

I should really put comments in there to make it clear what that code is for.