analyst-collective / analytics

Open source data models and analysis.
Apache License 2.0
46 stars 11 forks source link

Trello model and tests #7

Closed cmerrick closed 8 years ago

cmerrick commented 8 years ago

Introduces a trello model for card_location and tests for that model.

I introduced a test view with the structure (name, description, result) where result is a boolean value saying whether or not the test passed. We should standardize the structure for test views/tables, so feedback on this is appreciated.

I also modified the runner to consume any .sql file in the model directory.

jthandy commented 8 years ago

I'm really into the core idea that tests are just SQL that returns a test name, condition description, and boolean. Like...a lot. I have some concerns about the specific implementation details. Here's my thinking.

  1. I assume that the testing process for a schema should be to run all tests periodically to make sure that everything is passing. I don't really want tests localized to just trello...I just want to systematically run tests once a day (or something) and get notified if something fails. This to me means that rather than "trello_tests", it should just be "tests".
  2. But executing all tests within a single view obviously doesn't scale well at all. Also, I don't think we want a ton of non-analytical views cluttering up the schema...I think that this will be confusing for analysts.
  3. We need some way of registering an arbitrary number of tests, all of which execute in SQL and return the same set of fields, so that code can operate on the results in some standardized way. My proposed solution to that would be to have a tests directory, with SQL files for tests. Each file could contain one to many SELECT statements. Each test would be run via a runner, and that runner could just output pass/fail to the terminal, or could later be written to notify on certain conditions. This could even support levels (warn/error/notify).

What do you think? I believe that this is just a natural/logical extension of the concept you just executed on that would allow it to become a more complete version of itself.

jthandy commented 8 years ago

Also: I was able to successfully run and validate all of the output of the current version of the runner. So looks good on my end there. My only comments on this PR are about the structure of a test.

drewbanin commented 8 years ago

love the test implementation but agree that we should be mindful of how many views we're cramming into one schema. Not particularly miffed about it yet though... approved on my end

cmerrick commented 8 years ago

re: tests -

The advantage of the VIEW-based solution in this branch is that it can be run from anywhere - a runner that works as you described, a sql command line, or from a user's BI or viz tool. In my mind this is a key feature - a compelling use case for tests is to package the results alongside the delivery of analysis, and this makes that simple to do from any tool.

Since tests might be expensive to run, I disagree with your statement about always running all of them. That's one use case, but it's not the only one. As above, I would want to package test results with an analysis, and in that scenario it would be nice to only run the relevant tests. If they're divided, we could always create a single view that unions all of them together.

We might find that we want to structure the test files as just SELECTs and have the runner UNION and wrap them into a VIEW like this. In fact, that makes a lot of sense - what do you think?

jthandy commented 8 years ago

Oh. I actually hadn't considered that we could just have it both ways.

I get where you're coming from in terms of materializing it. I continue to be worried that we are putting to much random stuff into a single schema and that analysts will find it challenging, but I'm confident that the solution to that will be fairly simple if and when we run into that problem.

I'm very much on board with your final paragraph: I do think that the best way to create these is going to be separate from the analysis so that they can be manipulated in ways specific to testing. If we incorporate a part of the runner that pulls them together into a single view, that makes a ton of sense to me. Would you be open to doing that as a part of this PR or do you want to leave for later?

cmerrick commented 8 years ago

Would you be open to doing that as a part of this PR or do you want to leave for later?

Let's merge this as-is, but I can build that into the runner in short order

jthandy commented 8 years ago

Ok.

:+1:

cmerrick commented 8 years ago

Opening new PR for merge into master