CorrelAid / pystatis

MIT License
6 stars 1 forks source link

71 build proper end to end tests #99

Closed pmayd closed 2 weeks ago

pmayd commented 3 weeks ago

I implemented a new end-to-end test suite with the vcr plugin. This plugin allows to record request responses when executing test cases. When there is a cassette (record) the test is loading the data from this file, otherwise the request is executed.

To avoid executing any real requests in the CI/CD runs, I added the parameter --vcr-record=none to all the Github workflows.

I updated the regex for autodetecting the db because the case 86000U-Z-01 was not handled properly, but it is a table of Regionalstatistik.

I added a filter logic in the Table class so that any line that does not start with a 4- or 5-digit number is filtered out to solve the problem with some Regionalstatistik tables that do not start with the data but with a list of counties.

I completely refactored the parse zensus table function because Zensus now has a different format than the other databases: there is only a single value column and no longer a column for each value. This requires us to do some preprocessing to make the data tidy again by basically pivoting the data from long to wide format.

I also removed the Codes column from tables coming from Regionalstatistik. This is something that is not done for any other database and should be a parameter that the user can specify, not the default. But the notebook and example of Jonas is using these codes so the example will not work after this PR unless we implement a parameter.