CodeForPittsburgh / food-access-map-data

Data for the food access map
MIT License
8 stars 18 forks source link

Unit Testing: auto_text_process_name.R #134

Closed maxachis closed 3 years ago

maxachis commented 3 years ago

Issue page for unit testing of auto_text_process_name.R. To be closed once basic unit testing is complete or if unit testing is determined not to be needed. Unit tests can always be added at a later date, if need be.

Function of auto_text_process_name.R, as defined in the readme, is "Assigns types (like Chain Grocery Store, Farmer's Market, etc) to different addresses". Unit testing would likely involve ensuring that types are correctly assigned, and determining policy in edge cases if any exist.

Possible edge cases may involve situations where naming is ambiguous or the names of two categories are contained within the input. Or cases where punctuation confuses the list. "Farmer's Market" and "Farmers Market" are obviously the same thing, but if we don't have code to take into account punctuation, that could be thrown off.

Code itself is not terribly complex, however, and designing unit tests could be more trouble than it's worth. Because the code essentially checks if names are in a pre-existing list of categories, testing might simply entail checking if the entries (already categorized in the code) are properly categorized, which could be a little redundant.

More prominent of an issue is that a number of entries don't have categories, although a number seem like they could be easily given names. A location named "CARNEGIE FARMERS MARKET" is not categorized as anything, even though it seems clearly to be a Farmers Market.

There are others where it makes sense to not want to give them categories, because it's not clear what they are, and they appear to be local orgs that it would be a pain to individually investigate. "LIZZIES DAIRY ON WHITAKER", at 106 Whitaker Street, seems like it probably does belong on our list, but we'd likely have to call to get the answer as to what specifically it is. And "HAIR LAND INC", at 723 Penn Avenue, which is somehow in the merged dataset, mainly just raises more questions.

Again, unclear if unit testing, or simply some manual exploratory testing of the results would be more appropriate.

maxachis commented 3 years ago

Decided that unit tests for these are required, so I'm closing this issue!