OpenDataScotland / the_od_bods

Collating open data from across Scotland
MIT License
21 stars 18 forks source link

Improve generate_new_mock_data.py #159

Open KarenJewell opened 2 years ago

KarenJewell commented 2 years ago

See conversation in context: #151

from @gavbarnett

generate_new_mock_data.py could certainly be improved. likely you'd like it to be more selective in what it does, rather than a blanket update of everything. but that's a separate issue/feature

and from me (granted, may or may not be related):

But because we are using exact copies of data files (including content) it seems to be failing where the content between expected output (static) and test output (generated) is significantly different - in the cases I've seen, where the order of the content is different. So this makes it really easy to fail since we don't consistently order the content post-retrieval. I also have much much bigger questions about whether we should be using exact copies of data, rather than making a dummy set of data for test purposes only. Exact copies of data will age really quickly.

gavbarnett commented 2 years ago

To expand further:

Currently this script gets the JSON output from the URLs listed in sources.csv and stores these as mock API data for future tests.

It does this so pytest doesn't ever need to call the real URL to get data (as that data changes all the time and we need static tests). (There were some changes made to the API scrapers to accommodate this shim/mock redirection for testing)

The script then also generates the API scrapers CSV files from the mocked JSON output above. This is done to create a expected result for future tests.

When the script is run it deletes all existing mock data (JSON & CSV output) and regenerates them.

It is the intention that this script is run infrequently when either:

Suggested Improvements

Make the following possible with use of terminal flags etc. when calling the script.

gavbarnett commented 2 years ago

Seems like there is still a newline character issue that needs resolved here too.🪲

It's got something to do with how git automatically switches line endings behind the scenes. But it's messing up the mock CSV files on Windows now.