design integration test approach

Debugging full pipeline runs is too difficult and error-prone right now. The pathway we need to exercise in a testing environment goes from fetcher -> queue -> classification worker -> above-threshold results. A few thoughts about how we could do this:

A test outline I'm imagining would look something like:

create an empty SQLite database and use that as DATABASE_URL
create an empty queue and use that as BROKER_URL
call the appropriate queue_servicename_stories.py main method (with stubbed models, project file, and search results method)
verify expected number of story entries are created in DB and they look right
verify expected number of entries are in queue
start queue worker pointing at no-op (or throwaway) models
verify updated number of story entries are created in DB indicated stories that passed models

This could be used to test our fetching logic in each queue_servicename_stories.py, test cases like empty story lists, and overall give us confidence that we aren't breaking the overall integrated flow of the pipeline. How can we do this?

counterdata-network / story-processor

design integration test approach #72