cqframework / cql-language-server

A CQL language server compatible with the Language Server Protocol
Apache License 2.0
3 stars 5 forks source link

CQL Unit Testing #40

Open JPercival opened 2 years ago

JPercival commented 2 years ago

EDIT: Updated based on discussions below

The language server should support CQL unit tests. The first step is to define what a unit test looks like in CQL. We propose adding support for some @tags to support specifying unit tests and their input requirements in CQL

Tag Value Description
@test N/A If specified on a Library, marks it as a test suite. If specified on a definition, marks it as a test
@parameter Name CQLValue set input parameters for Libraries
@asof CQLDatetime evaluate as of a specific date
@context Context=Value the value of the context
@data Path source directory for test data
@terminology Path source directory for test terminology
@mock Definition=Value (future work) specifies a mock value for a CQL definition
@parameterized <tag> (future work) specifies that a test should be repeated with a different set of inputs
@dataprovider ExpressionReference (future work) supplies a set of data used to run tests
@ignore N/A (future work) Report the results of this test, but don't fail the overall suite if this fails

An example of a test Library using these tags:

// @test
// @parameter: "Measurement Interval" [@2019,@2020]
// @asof: @2020-10-01
// @context: Patient=654
// @terminology: tests/vocab
library DQMTest

include DQM as targetLibrary
include org.opencds.cqf.cql.Asserts version '1.0.0'
include TestHelpers

// @test
// @data: tests/data
// @context: Patient=123
define "In Initial Population":
  assert(targetLibrary."Initial Population").isTrue()

// @test
// @data: tests/data
// @context: Patient=123
define "Has Required Attributes":
   TestHelpers.HasRequiredAttributes(Patient)

Evaluating a definition marked with @test and getting a Message with the severity level of Error is a test failure.

http://cql.hl7.org/09-b-cqlreference.html#message

@test is required to use the other tags. If any are present without @test, it's an error condition. IOW, these tags are only allowed on unit test libraries.

Any @tag defined at the Library level sets the default @tag value for the set of tests within the Library. @tag values on a definition override the default Library @tag.

Each @test definition is evaluated independently and with the appropriate input parameters as defined by the merged definition and library @tag values. It's up to the test runtime environment to optimize data or context caching to speed up test execution.

If during the evaluation of a test, the CQL definition references any other CQL definitions that are also marked as @test (or with any other of the proposed tags), those @tags are ignored. Only the @tags of the entry point apply. That is to say, during the evaluation of a test the context, input parameters, terminology, or data may not change mid-evaluation.

CQL Test libraries SHOULD NOT be shipped as part of an executable package, such as a FHIR npm package.

The example org.opencds.cqf.cql.Asserts library does not exist at the time of this writing. The contents would be a set of helper functions to assist in specifying expected results for tests.

JPercival commented 2 years ago

Relevant API for VS Code: https://code.visualstudio.com/api/extension-guides/testing

JPercival commented 2 years ago

Had some discussions with Bryn. We think that these test tags should be disallowed outside the context of a test Library to avoid confusion with runtime behavior of CQL code.

JPercival commented 2 years ago

Test functions should throw an Exception if the test is a failure. The way to do that currently is to use the "Message" operator with an "Error" status in CQL: http://cql.hl7.org/09-b-cqlreference.html#message

JPercival commented 2 years ago

Proposed tags: @test = marks a Library as a group of tests, marks a specific definition as a test within the Library @parameter Name CQLValue = set input parameters for Libraries @asof CQLDatetime = evaluate as of a specific date @context Context Value = the value of the context @source Path = source directory for test data

@test is required to use the other tags. If any are present without @test, it's an error condition.

JPercival commented 2 years ago

Open questions: Multiple contexts supported? Use of tags outside tests? (we think no, since authors could come to depend on it for runtime behavior of, say, a Measure) Use of remote server for data? Terminology tag needed for terminology source? Shared data across tests? (workaround, load everything and set context) Should this convention map to the XML test definitions for FHIRPath / CQL Engine compliance? (we think no, since those include compiler compliance as well which is a different requirement than unit testing) Should the @source tag be inferred for an IG?

c-schuler commented 2 years ago

Very cool!

I would advocate for being able to use a remote server for data and/or terminology. I think that would promote shareability and a terminology source allows for more robust testing.

JPercival commented 2 years ago

Note that this specifically excludes testing of Knowledge Artifacts that may leverage CQL, such as FHIR Measures and PlanDefinitions. Both Measures and PD may not use CQL at all, or be defined in terms of a different language altogether, such as FHIRPath. In other words, full authoring support of those types of artifacts are beyond the scope of this current discussion, and debugging / testing those needs to occur at a higher, FHIR-based layer within the context of a CQL/FHIR authoring environment.

JPercival commented 2 years ago

I would advocate for being able to use a remote server for data and/or terminology. I think that would promote shareability and a terminology source allows for more robust testing.

The main issue/consideration there is that "unit tests" for other languages are generally expected to run quickly and not require any external data or input. IOW, you wouldn't expect a unit test to fail if a remote server were down.

Bryn and I also discussed some alternate test conventions. We did a quick survey of various tools and languages out there such as C#, Java, Python, Haskell, F#, and SQL and found that all of them had a widely available XUnit inspired framework. So that seems to be the most common and familiar paradigm for a wide variety of developers. In principle, as a functional language, CQL could adopt some other convention such as "all test definitions return true." We think that the "Assert expected result" paradigm is the least surprising, and allows authors to inspect test output for the sake of a quick "edit, run, debug" development loop.

vitorpamplona commented 2 years ago

It would be nice if the tool took a screenshot of useful data within a database/FHIR Server and saved that as input to the test. In my experience, CQL changes are always a reaction to new data in a server as opposed to being specification driven. Most CQL constructs that I have seen never treat edge cases. They are only created to match the specific way an institution is utilizing FHIR.

cmoesel commented 2 years ago

Writing unit tests for CQL in CQL... neat! We developed a test framework for CQL here, but it's quite different in its approach. I like the idea of adding an XUnit flavored approach in CQL. Very cool.

If it is built into the language server, what does that mean for being able to run automated tests from the command line and/or in a headless CI environment? Ideally, I'd like a command I can put on my path so I could just run cql-test from a CQL project folder and have it just work (perhaps with a little required configuration). Does integrating it into the language server still allow for that use case?

Have you considered building it on top of an existing framework so you can take advantage of runners, reporters, etc? That might get you some extra bang for your buck -- but perhaps it also limits some of the features you can provide. Just throwing out some ideas...

vitorpamplona commented 2 years ago

I would make a distinction between making a tool for development, debugging, and troubleshooting new CQLs (developer experience) and making a tool for automated, interoperability testing, and production auditing (DevOps experience). Both can use the same underlying principles and languages but are vastly different in their use.

cmoesel commented 2 years ago

I'm on a team that has developed CQL-based CDS in the past, and used a git repository for collaboration. In that case, having tests automatically run for PRs is quite helpful in ensuring that code meets a certain standard of quality before being merged to main. That's what I was advocating for (not full-on interoperability testing).

JPercival commented 2 years ago

@cmoesel

We developed a test framework for CQL here,

The approach you've created there is aligned with what I had in mind for higher-level PD/Measure testing. My thinking was that for CQL it'd be nice to try to keep the developer in the CQL as much as possible.

If it is built into the language server, what does that mean for being able to run automated tests from the command line

This is a good point. The cql-evaluator can already run CQL from the command line and accepts the equivalent of these tags as arguments. The new piece is test case discovery from the tags. My thinking had been to build the discovery bits into the language-server as a "trial use" thing and go from there, but putting the discovery bits in the evaluator would allow command line execution too.

Have you considered building it on top of an existing framework

The defacto build tool for most CQL projects is the IG Publisher. It's not particularly extensible so it's a challenge to integrate with it cleanly. It does already have a cql-translator integration so maybe it's not too difficult extend that a bit. Reporting in that scenario would look like the typical QA errors.

I take a look at some other options too and see what I come up with. Thanks for the suggestion.

JPercival commented 2 years ago

@vitorpamplona

It would be nice if the tool took a screenshot of useful data within a database/FHIR Server

This would be useful feature. I'm not sure how'd we'd implement it though. The trick is that if you have a test server with 1000s of Patients, how do you select only the data relevant for a particular test case? That may span ValueSets, non-Patient-compartment data like Medications, etc. Maybe we could start with the $everthing operation. Or generate data-requirements for some CQL and run the queries. Another option would be some type of "picker" UI.

vitorpamplona commented 2 years ago

Maybe using CQL to select subsets of data and exporting it in a test-ready file is another tool?

vitorpamplona commented 2 years ago

It feels like a tool to compare an older snapshot with a current snapshot of the required subset could be a very important troubleshooting tool. When a CQL in production goes wrong, the first step is to check if your test data is still relevant.

seanmcilvenna commented 2 years ago

Where is the input XML/JSON specified for the unit test? Assuming that you can specify what data is being unit tested somewhere...???

vitorpamplona commented 2 years ago

Where is the input XML/JSON specified for the unit test? Assuming that you can specify what data is being unit tested somewhere...???

// @test
// @source tests/data       <-- here
// @context Patient=123
define "AtLeast2Obs":
  assert.GreaterThan(Count(DQM.PatientObs), 2)
vitorpamplona commented 2 years ago

The more I think about this idea of setting up the test execution environment inside CQL, the more I dislike it.

It feels like we are finding ways to go around the fact that CQL is just a query language. To account for practical settings, these @tags can get very complicated and not also not be complicated enough for practical use. After all, you only need one tricky test case to make devs go back to other languages.

JPercival commented 2 years ago

@seanmcilvenna - The data source is specified with the @source tag. Should we call that @data instead? Maybe we need one for terminology as well.

seanmcilvenna commented 2 years ago

Ahh! I understand, now. That satisfies my concern. Thanks!

JPercival commented 2 years ago

@vitorpamplona

After all, you only need one tricky test case to make devs go back to other languages.

The @tags are already part of the CQL spec and are used for things like fluent functions. So the tags themselves are not new.

I think the fact that CQL is only a query language actually makes it a much more natural fit for assertion-based testing. There's no mutable state, there's no asynchronicity, there's no threading, it's defined to be idempotent (given @asof), etc. Given that test data is locally available, and that all you can do is query the data and verify the results of the query are as expected, what's the complicated use case you have in mind?

vitorpamplona commented 2 years ago

I hear the benefits of the language and I agree. My issue is with the complexity of the tags and how easy it is to read what's happening.

For instance, let's say you have a suite with 3000 tests. For each, you build slightly different versions of your data to account for everything that can happen in production.

// @test
// @data tests/data1
// @terminology tests/terminology1
// @context Patient=123
// @parameter "Measurement Interval" [@2019,@2020]
// @asof @2020-10-01
define "AtLeast2Obs":
  assert.GreaterThan(Count(DQM.PatientObs), 2)
  assert...
  assert...
  assert...
  assert...

// @test
// @data tests/data2
// @terminology tests/terminology1
// @context Patient=123
// @parameter "Measurement Interval" [@2019,@2020]
// @asof @2020-10-02
define "AtLeast2Obs":
  assert.GreaterThan(Count(DQM.PatientObs), 2)
  assert...
  assert...
  assert...
  assert...

// @test
// @data tests/data3
// @terminology tests/terminology1
// @context Patient=123
// @parameter "Measurement Interval" [@2019,@2020]
// @asof @2020-10-01
define "AtLeast2Obs":
  assert.GreaterThan(Count(DQM.PatientObs), 2)  
  assert...
  assert...
  assert...
  assert...

Integration tests like these are testing not only the CQL but also the data and the environment. But those things would be in different files, etc... Keeping things organized would be hard.

vitorpamplona commented 2 years ago

With a suite of 3000 tests in mind, you quickly get into performance questions. Is it possible to load things just once and run everything as one huge test method? Can we reuse only the terminology loads? Can we load multiple files in the @data tag to avoid duplicating test information? Do we have caching instructions on the tags? Any other test lifecycle possibilities? Will it offer Test Suites for similar execution environments? Can we test the efficiency (time to run) of CQL instructions?

How do we specify these @Tags to support most of what we see in other testing frameworks out there?

vitorpamplona commented 2 years ago

Here's how I would write a realistic test with multiple data inputs in ruby. Keep in mind that devs have a fully-feature language in every step along the way to specify how to build, how to destroy, how many times to run, how to modify from one run to another, etc. Those Hows would need to be mapped to tags and fixed files in the CQL version.

# app/models/article.rb
class Article < ApplicationRecord
  enum status: [:unpublished, :published]

  def self.published_in_the_past
    # we expect this method to fail first
    where(nil)
  end
end
# spec/factories/articles.rb
FactoryBot.define do
  factory :article do
    status :unpublished

    trait :published do
      status :published
    end

    trait :in_the_past do
      published_at { 2.days.ago }
    end

    trait :in_the_future do
      published_at { 2.days.from_now }
    end
  end
end
# spec/models/articles_spec.rb
require 'rails_helper'

RSpec.describe Article do
  describe ".published_in_the_past" do
    let!(:unpublished_article)     { create :article }
    let!(:published_in_the_past)   { create :article, :published, :in_the_past }
    let!(:published_in_the_future) { create :article, :published, :in_the_future }

    it { expect(Article.published_in_the_past).to include published_in_the_past }
    it { expect(Article.published_in_the_past).not_to include unpublished_article }
    it { expect(Article.published_in_the_past).not_to include published_in_the_future }
  end
end
JPercival commented 2 years ago

With a suite of 3000 tests in mind,

So, integration tests are a separate concern in my mind and this proposal is not intended to cover that. I don't think it's the case that we're trying to test CQL and data and the environment. The environment is fixed in this proposal and the data is hand-crafted by the author of the CQL. You wouldn't include 1000s of patients in a source-code repository for any language.

// @test // @data tests/data1 // @terminology tests/terminology1 // @context Patient=123 // @parameter "Measurement Interval" [@2019,@2020] // @asof @2020-10-01 define "AtLeast2Obs": assert.GreaterThan(Count(DQM.PatientObs), 2) assert... assert... assert... assert...

// @test // @data tests/data2 // @terminology tests/terminology1 // @context Patient=123 // @parameter "Measurement Interval" [@2019,@2020] // @asof @2020-10-02 define "AtLeast2Obs": assert.GreaterThan(Count(DQM.PatientObs), 2) assert... assert... assert... assert...

The intent is to allow a given @tag to be defaulted at the Library level. Secondarily, you couldn't have multiple asserts the way you've written it there. CQL isn't imperative in that fashion. You could "and" a bunch of those together. But the readability of your example would actually be:

// @test
// @terminology tests/terminology1
// @context Patient=123
// @parameter "Measurement Interval" [@2019,@2020]
library Test

// @test
// @data tests/data1
// @asof @2020-10-01
define "TestAtLeast2ObsOctober1":
  assert.GreaterThan(Count(DQM.PatientObs), 2)

// @test
// @data tests/data2
// @asof @2020-10-02
define "AtLeast2ObsOctober2":
  assert.GreaterThan(Count(DQM.PatientObs), 2)

Still not the prettiest, but not all that bad compared to Java annotations.

Can we load multiple files in the @data tag to avoid duplicating test information?

Yes, and that's how IGs are frequently structured.

Do we have caching instructions on the tags?

This doesn't strike me as necessary for the small amount of data a unit test entails. That said, it would be possible to implement pretty easily I think.

How do we specify these https://github.com/tags to support most of what we see in other testing frameworks out there?

Also not necessary IMO given the limitations of CQL. For example, test order doesn't matter

vitorpamplona commented 2 years ago

Secondarily, you couldn't have multiple asserts the way you've written it there.

oh, wow. Yes, I wouldn't want to copy-paste all the tags and make the framework reload all the data just for 1 assertion.

JPercival commented 2 years ago

make the framework reload all the data just for 1 assertion.

Allowing the tags to be defaulted a Library level gives the author some level of tunability. They could create a data set that's reused for a single library. And the test framework could be smart enough to notice duplicate data tags across expressions, libraries, or the entire test set and keep that cached. Given that there's no chance of data changing, caching it for the entire test run is a safe thing to do.

JPercival commented 2 years ago

Keep in mind that devs have a fully-feature language in every step along the way to specify how to build,

It's also important to recognize the target audience of CQL, which are clinicians or knowledge authors who may have zero experience with any programming language. The target users of CQL wouldn't know how to spin up a Ruby script.

FHIR operations and cds-hooks is a whole other level of complexity and authoring concerns. You can specify entire clinical workflows (not just expressions) with a PlanDefinition, so authoring and testing that is much more complicated. Do you use a visual workflow builder GUI to write a PlanDefinition? That's also, generally, where integration tests would occur. Given this population of 10,000 patients, how many passed a Measure? PlanDefinitions and Measures may also be specified without CQL at all using some other expression language, so I'm trying to separate those concerns (which are valid and solutions are needed in that space) from this one, which in my mind is quite a bit more narrow.

The "Arrange, Act, Assert" is a pattern that's commonly used with xUnit test frameworks. "Act" in CQL is evaluating an expression, "Assert" (according to this proposal) is sending an "Error" message if some condition is not met, and as a read-only lanauge CQL provides no functionality to "Arrange." So the "Arrange" has to be handled somewhat out of band. You can either jump out to some other language or construct, or you can add some meta-data to the CQL definition that allows a test framework to do that for you.

If you jump out to something else, what do you use? Javascript because the FHIR IGs are packaged as npm packages? Ruby because the IG Publisher requires it to be installed to run? Java because the CQL engine is written in Java? There's no obvious, IMO, answer. Anything I can think of is actually much more complex than the tags I've proposed here. So relative to other possible solutions, the tags seem simple.

The closest thing to a standard build tool in this space is the IG Publisher. Would it be preferable to specify a test suite as an Ant task the Publisher can use? For integration tests, I think that's actually a reasonable solution. For a small set of unit test data, it seems preferable to keep it all in CQL as much as possible.

vitorpamplona commented 2 years ago

I agree. The problem is the ideas' dependency on the "Arrange" tooling. If authors need other languages to arrange things together for the test, then: 1- They are not just a clinician anymore. It's more likely a developer job. 2- Once you go there, why not just stay there and use that language's testing frameworks? Why would you use a more restrictive language if you know and use something else?

That's why I suggested solving the "Arrange" step with a CQL tool over FHIR servers. In that way, you don't need to know JSON or any other language. You can just generate 100s of similar unmaintainable files with small variants based on a single CQL code. Then just generate 100s of test CQL functions, one for each file, with the same or similar assertion condition. It feels dumb but doesn't touch any other language.

JPercival commented 2 years ago

Hmm... If I'm understanding you correctly you're suggesting code-genning test cases based on the CQL that's written?

vitorpamplona commented 2 years ago

The hope was to build a snapshot-making process to use real data, with real variances among objects of interest for a given test, to select the variances, remove personal information (I am not sure if this is possible), and save it locally to be an input for the tests. CQL authors and test authors never see or change the data files.

JPercival commented 2 years ago

CQL authors and test authors never see or change the data files.

In practice, because CQL is model-agnostic (meaning you can using QDM, using FHIR, etc) CQL authors do have to have an understanding of the model that they are using. In fact, you can author CQL that's unique to a specific profile for FHIR, for example, using QICore. You can even specify a profile within an IG and the expectation is that the tooling will eventually auto-generate the necessary ModelInfo to allow you to author CQL against it.

To correct edge cases authors have to inspect the data. It's not enough to just know you have an incorrect result. You have to know why that happened. "Oh, I see, this Encounter is missing an endDate and my logic doesn't account for that." That's generally true of any programming language. You can't fix the logic if you don't understand why it failed and the source data is an important part of that.

There are tools for generating "realistic" data, such as Synthea. I've used that to generate millions of sample patients for testing a Spark cluster. Executing a Measure (or even CQL) against a cluster like that is beyond the scope of what I'm hoping to achieve here.

Then just generate 100s of test CQL functions, one for each file, with the same or similar assertion condition

Even if you're able to generate 100s of test functions you still need a way to specify "What is a test, and how do I know if the test passed." You could do that all in an outside tool, which is what is currently done. The Bonnie tool, for example, provides a way to select test data and set expected results. However, to do it in CQL you need to, at a minimum, accept the convention that a "Error" Message indicates a test failure and report that as appropriate for a given toolchain or authoring environment. The @test tag provides a way to indicate which definitions are tests and which are not.

vitorpamplona commented 2 years ago

Yes, people know the model they are working with. They don't know JSON/XML. At least not enough to understand the way the data is assembled by these formats into the model they know well.

I agree that they need to see what's in the data they are testing. They just shouldn't touch the low-level files. Maybe we just need another tool to help them visually create these files.

And yes, they still need to code the tests. But tooling can help them apply the same test in 100s of data objects.

However, to do it in CQL you need to, at a minimum, accept the convention that a "Error" Message indicates a test failure and report that as appropriate for a given toolchain or authoring environment.

Yet another limitation. I just changed the CQL Tests on the engine and I had to make the tooling look for "TEST PASSED" strings :)

JPercival commented 2 years ago

They just shouldn't touch the low-level files.

Can we agree that this isn't relevant to the proposal here? Whether the authors craft the data manually or there's a visual tool for doing so, is the @data tag sufficient to indicate the test set? It works in the context of an IG which is, currently, the most common use case for CQL.

But tooling can help them apply the same test in 100s of data objects.

This is something like the @Parameterized annotation in JUnit. That's a nice to have feature. Does anything here exclude that as future work?

JPercival commented 2 years ago

Yet another limitation.

This part I don't understand. Every test framework I'm familiar with has some convention for indicating test failure, usually an Exception. Even if we were to write the tests in Java capturing the "TEST PASSED" string results in an Exception at the JUnit level. Do you have another idea in mind?

vitorpamplona commented 2 years ago

They just shouldn't touch the low-level files.

Can we agree that this isn't relevant to the proposal here?

I think it is relevant. Your point is that the "Arrange" part is someone else's problem. This is fine as an argument, but somebody will need to solve it. :)

This is something like the @Parameterized annotation in JUnit. That's a nice to have feature. Does anything here exclude that as future work?

No, the 100s data objects that I am citing refers to a pre-defined version of the FactoryBot model in Ruby: from here: https://github.com/DBCG/cql-language-server/issues/40#issuecomment-1164624862

@ParameterizedTest is interesting as well, but not a replacement.

@ParameterizedTest
@EnumSource(
  value = Month.class,
  names = {"APRIL", "JUNE", "SEPTEMBER", "NOVEMBER", "FEBRUARY"},
  mode = EnumSource.Mode.EXCLUDE)
void exceptFourMonths_OthersAre31DaysLong(Month month) {
    final boolean isALeapYear = false;
    assertEquals(31, month.length(isALeapYear));
}
vitorpamplona commented 2 years ago

Yet another limitation.

This part I don't understand. Every test framework I'm familiar with has some convention for indicating test failure, usually an Exception. Even if we were to write the tests in Java capturing the "TEST PASSED" string results in an Exception at the JUnit level. Do you have another idea in mind?

Throwing exceptions would be nice. That's what I refer to as "limitation": You have to make things work with less. This could be ok if you don't have other options, but people do have options.

I am not against CQL for tests. That's fine. We can deal with it. I am less inclined to the @Tag structure and static data inputs alone to structure all the potential variability we usually see in test cases.

vitorpamplona commented 2 years ago

My perfect use of a CQL-based test suite would extend CQL as a language to add object creation directly inside it in a way that authors don't need to worry about learning anything else.

Something like:

library VitorTest

include DQM
include org.opencds.cqf.cql.Asserts version '1.0.0' as assert 

create DQM.Patient(1, "Vitor", "1911-01-01", "Male")   
create DQM.Observation(1, 1, ...)
create DQM.Observation(2, 1, ...)

// @test
define "AtLeast2Obs":
  assert.GreaterThan(Count(DQM.PatientObs), 2)   
JPercival commented 2 years ago

I think it is relevant. Your point is that the "Arrange" part is someone else's problem.

There are already many tools for generating FHIR resources such as GUIs (such as the ClinFHIR tool), exporting from a FHIR server via the $everything operation or bulk data export, or via synthetic test data generation tooling such as Synthea. It's real problem, but it's not the problem of specifying tests, IMO.

object creation directly inside it in a way

CQL already supports this. You can create objects. You just can't write them. Here's a link and an example:

http://cql.hl7.org/02-authorsguide.html#structured-values-tuples

define "PatientExpression": Patient { Name: 'Patrick', DOB: @2014-01-01 }

Throwing exceptions would be nice.

The Error Message is the equivalent of an exception in CQL.

http://cql.hl7.org/09-b-cqlreference.html#message

Error – The operation produces a run-time error and return the message to the calling environment. This is the only severity that stops evaluation. All other severities continue evaluation of the expression.

vitorpamplona commented 2 years ago
define "PatientExpression": Patient { Name: 'Patrick', DOB: @2014-01-01 }

Nice! Why don't we base the CQL Test Suite on this, then? The @Data tag should only be used when the data requirements for a given test are massive.

JPercival commented 2 years ago

@ParameterizedTest is interesting as well, but not a replacement.

I'm suggesting we use it to parameterize the other tags, such as the @data or @context annotations. Consider this:

// @data src/giant/test/set
// @parameterized @context Patient=1, Patient=2, Patient=3, Patient=4

That does replicate the behavior you're suggesting.

If there is more than one tag parameterized, you'd do the cross product of all parameterized tags.

JPercival commented 2 years ago

Nice! Why don't we base the CQL Test Suite on this, then?

This doesn't cover the most common usage of CQL, which is authoring inside of an IG. IGs require each example resource be written as JSON or XML. See this for examples:

https://github.com/cqframework/ecqm-content-r4-2021/tree/master/input/tests/BreastCancerScreeningsFHIR

vitorpamplona commented 2 years ago

But for that they can just use the JSON in the @Data tag. Clinician authors should focus on the objects they can create inside CQL.

JPercival commented 2 years ago

Are you arguing against the @data tag completely or are you saying that best-practices for unit tests would be that the data is created inline? I agree with that latter. If possible, tests should be self-contained and we should support that scenario.

cmoesel commented 2 years ago

Maybe I'm testing different things than others, but sometimes I need a good chunk of data for a test (Patient plus multiple Conditions/Observations/Procedures/etc). It's much easier for me to produce that data using existing tools for such, rather than having to build it all in CQL.

Thinking on this, however, I'm realizing that in some cases, testing small units of CQL can be quite difficult -- because CQL definitions so often call out to other CQL definitions. For example, imagine a statement like define Foo: if Def1 then 1 else if Def2 then 2 else 3. It's impossible to test that logic alone without also testing all the underlying logic in Def1 and Def2. This is how you sometimes end up needs tons of data just to test one statement.

At the risk of complicating things further, I wonder if we eventually might want a mock framework that would allow us to say @mock Def1 = false to avoid needing to build up all the data needed for referenced expressions.

vitorpamplona commented 2 years ago

Are you arguing against the @data tag completely or are you saying that best-practices for unit tests would be that the data is created inline? I agree with that latter. If possible, tests should be self-contained and we should support that scenario.

The later. But from a user's point of view. If the idea is to target clinician authors, don't make them learn JSON/XML or other IG-based roles.

vitorpamplona commented 2 years ago

Maybe I'm testing different things than others, but sometimes I need a good chunk of data for a test (Patient plus multiple Conditions/Observations/Procedures/etc). It's much easier for me to produce that data using existing tools for such, rather than having to build it all in CQL.

I am in the opposite camp. I usually see very large data input test files whose tests are just assessing if people have two observations or if an operation doesn't crash. Data authors could have manually removed 95% of those data fields and still run the same test. It's terrible for maintenance (not knowing why there is so much data in a given test).

But if those test cases exist because the authors were not devs, I give them a pass :)

JPercival commented 2 years ago

It's impossible to test that logic alone without also testing all the underlying logic in Def1 and Def2. This is how you sometimes end up needs tons of data just to test one statement.

I wonder if we eventually might want a mock framework that would allow us to say @mock Def1 = false to avoid needing to build up all the data needed for referenced expressions.

I really like this idea. Maybe not as first pass, but the utility of this is obvious to me. This could go a long way to minimizing the data requirements as well, to Vitor's point.

vitorpamplona commented 2 years ago

Can we do this? @Data evaluates the expression and makes the returning list of objects the data source for the test.

library VitorTest

include DQM
include org.opencds.cqf.cql.Asserts version '1.0.0' as assert 

define "Patient1": Patient(Identifier: 1, Name: 'Patrick', DOB: @2014-01-01)   
define "Observation1": Observation(Subject: Patient1, ...)
define "Observation2": Observation(Subject: Patient1, ...)

define "setup": 
  { Patient1, Observation1, Observation2 } 

// @test
// @data setup
define "AtLeast2Obs":
  assert.GreaterThan(Count(DQM.PatientObs), 2)