evidence-dev / evidence

Business intelligence as code: build fast, interactive data visualizations in pure SQL and markdown
https://evidence.dev
MIT License
4.45k stars 213 forks source link

Automatic testing #82

Closed yhoiseth closed 3 months ago

yhoiseth commented 3 years ago

I’m curious about whether you are planning to add automatic testing capabilities to Evidence. That seems like one of the biggest wins of using code and source control.

For example, if I could do something like load an example database and check that my queries return the expected results, that would make me much more confident in my queries.

mcrascal commented 3 years ago

Yngve, yes automated testing is a big priority for us.

Could you expand on what you would be testing in the case where you are loading an example database?

There are two tiers of testing that we're thinking about right now.

1. Do the queries run, and does everything work? The idea here would be to never serve an error message to your end users that was the result of a bad SQL query or otherwise.

When you go to generate your site to serve it to people in a production environment, we can block it from deploying if it's throwing errors. Then, instead of serving a broken report, we'd just serve the last build of the report + a warning 'data is delayed'

We'll get a lot of this 'for free' by just deploying an evidence project on vercel or netlify.

2. Is the source data still reliable? You may still be able to successfully execute a query against your DB, and make a chart, but if there is a problem in your ETL process, that query might return erroneous data. In our experience, this is a much more common failure case in reporting applications, and it is much more damaging to user trust.

Right now, we don't have plans to build our own testing suite for this type of thing. Instead, we're going to build tooling to plug into other data quality & testing tools. dbt 'exposures' is our first priority here.

Again, the end result will be that updates to your reports are blocked while you have failing tests up stream.

Stale data > wrong data.

yhoiseth commented 3 years ago

Great points.

I would add a third question which I don’t think is covered by your tiers: Do the queries return the expected results?

The rationale is that there is usually some sort of business logic in queries, and I would sleep better at night knowing that this logic is tested. I don’t want to be in a situation where the system “somehow creates some charts and who knows if they are correct”.

For example, let’s say that I have the following users table.

id email_confirmed_at
1 NULL
2 2021-08-06 10:56:44.244319

If I have a query called number_of_users_with_confirmed_email, then I would like to assert that the query returns 1 for this example table.

mcrascal commented 3 years ago

@yhoiseth I am struggling a bit with this example.

To stay with your table, if tomorrow user id 1 confirms their email, wouldn't you want your query's result to change to 2? If so, wouldn't your test fail?

Put another way, if you know the result of the query in advance, what is the purpose of writing the query at all?

That said, in tools like dbt you can do assertions. Usually people do these things to assert that a column is unique, or not null, but they can also be used to ensure that some historical known fact hasn't changed. E.g. 2019 revenue should always sum to $30M, and if it doesn't the test should fail.

Is there a data testing tool or library that you are thinking about that I should look into?

yhoiseth commented 3 years ago

I am struggling a bit with this example.

No problem :)

To stay with your table, if tomorrow user id 1 confirms their email, wouldn't you want your query's result to change to 2? If so, wouldn't your test fail?

The table would be a fixture. In other words, it would be hardcoded and would not change when production data changes.

Let me try to illustrate with an example. If I did something similar in Django, number_of_users_with_confirmed_email might be a function:

def number_of_users_with_confirmed_email() -> int:
    return self.objects.filter(email_confirmed_at__isnull=False).count()

A test for this function might look something like this:

from django.test import TestCase
from myapp.models import User

class UserTestCase(TestCase):
    def setUp(self):
        User.objects.create(email_confirmed_at=None)
        User.objects.create(email_confirmed_at="2021-08-06 10:56:44.244319")

    def test_confirmed_email_count(self):
        self.assertEqual(number_of_users_with_confirmed_email(), 1)

Put another way, if you know the result of the query in advance, what is the purpose of writing the query at all?

I know the result of the query on a small, simple dataset, as I can manually check it. But I don’t know the result of the query on my production dataset, as that is too big to manually check. That is why I need the query.

That said, in tools like dbt you can do assertions. Usually people do these things to assert that a column is unique, or not null, but they can also be used to ensure that some historical known fact hasn't changed. E.g. 2019 revenue should always sum to $30M, and if it doesn't the test should fail.

That’s good (I don’t know dbt), but it doesn’t protect my team from making mistakes in the queries we are writing in Evidence.

Is there a data testing tool or library that you are thinking about that I should look into?

My thinking is that when I am using Evidence, I am making an application. If I want to do it properly, I should test it like I would any other application. That way, development is more enjoyable and I can be reasonably sure that things work, even if many people who don’t know the codebase are working on it at the same time, we don’t have QA people, etc.

I’m not a testing expert, so I don’t have any strong opinions on what would be a good inspiration for testing in Evidence. I can provide some pointers, though:

First, most web frameworks have something similar to Django’s testing tools.

Second, it is possible to write tests in SQL, so this might be a good approach for Evidence. See pgTAP. (I have never written tests in SQL myself.)

Third, a slick BDD implementation might be the best approach. I could, for example, write the above test in the following way.

Feature: Number of users with confirmed email

Scenario: One has confirmed, another has not
  Given the following users exist:
    | id  | email_confirmed_at         |
    | 1   | NULL                       |
    | 2   | 2021-08-06 10:56:44.244319 |
  When I check the number of users with confirmed email
  Then the result is 1

See, e.g., https://cucumber.io.

mcrascal commented 3 years ago

@yhoiseth. This is fantastic, thank you for taking the time to explain it to me.

The table would be a fixture. In other words, it would be hardcoded and would not change when production data changes.

That is the piece I was missing. Of course, this makes total sense.

I suspect the right answer here is going to be building thoughtful support for a variety of existing test frameworks. The interesting challenge here is the mix of:

I don't have a clear answer on this yet, but this definitely needs to form the third (or third and fourth) track of our thinking around testing.

archiewood commented 1 year ago

Best current strategy

npm run build:strict

https://docs.evidence.dev/cli/#commands

More to come here!