CodeForPittsburgh / food-access-map-data

Data for the food access map
MIT License
8 stars 18 forks source link

Unit Tests for agg_clean_data? #131

Closed maxachis closed 3 years ago

maxachis commented 3 years ago

This is an issue where I want the input of @cgmoreno and @conorotompkins on this, since they helped put this script together. Also @hellonewman, since she's the co-captain who normally gives guidance on where we on the backend should direct our efforts.

The actual code for agg_clean_data.R is pretty straightforward:

image

As is, there's not a lot to make a unit test for, because the code itself doesn't have a lot of places it could go wrong. So unit testing might not even be required.

But, maybe we want this code to throw an error if it receives data in an odd format (i.e. missing key information), or if the data is not clean. It might be useful to catch such issues then before they cause larger issues down the line. In that case, we could add some functionality to check for proper formatting, and I could put together unit tests to make sure that works and catches obvious issues.

Now that could be a nice-to-have, or it could just add additional baggage that someone else might have to maintain. Even if some issues slip through, it might be fine if we keep it simple and don't add additional checking, at least not until an actual issue pops up which we could then make tests in response to. And if we do get data in an odd format without having checking code in here, some might break code further down the line in a way that's likely pretty obvious. If our address cleaner breaks because some addresses were listed as "Null" or "NA", well, that'd not be too difficult to identify.

This issue could be nitpicky, but since we're in the final laps, I want to be conscious of potential issues that could show up.

maxachis commented 3 years ago

I was recently in the airport, so I find myself thinking in TSA metaphors. The more things we have in our security checkpoint, the more likely we are to catch something. But it also creates more points where, if something breaks down, things can slow down a bit. So even though this is relatively minor, where we want to fall on that spectrum as a general rule is probably useful to define before I get too deep into my unit testing.

maxachis commented 3 years ago

Team decided not to add unit testing for this, because it's pretty straightforward, and to instead have data integrity checking be its own separate script in run.sh