Project tracking for GA4GH Computable Cohort Representation Hackathons
The instructions to deploy the above fork of data-connect-trino are identical with one exception, to those described in Try a Reference Implementation here,. The container should mount a volume at /models which contains json schema files for data tables. The following example shows how to run the container via docker, with the schemas files in a local folder called /yourpath/data-connect-models.
docker run --rm --name dnastack-data-connect -p 8089:8089 -v /yourpath/data-connect-models:/models -e TRINO_DATASOURCE_URL=https://trino-public.prod.dnastack.com -e SPRING_DATASOURCE_URL=jdbc:postgresql://host.docker.internal:15432/dataconnecttrino -e SPRING_PROFILES_ACTIVE=no-auth dnastack/data-connect-trino
Using the HAPI FHIR server definition in docker-compose.yml
, we can spin up the server using:
docker-compose up -d
Wait a few minutes for the server setup process to complete. Once this is done, you should confirm the service is available by visiting the following URLs via web browser:
Note: This requires docker
and docker-compose
to be installed on your local machine.
For the hackathon, the GA4GH tech team has spun up a web-accessible HAPI FHIR instance via Amazon Web Services (AWS). This service is available at https://cohort.ga4gh-demo.org/ . For example:
The cloud-based instance has already been populated with a synthetic asthma dataset for the purpose of the hackathon. See the next sections for how the synthetic data was generated and uploaded.
Requires Java on your local machine (tested with Java 11.0.6)
First, download the v3.0.0 Synthea JAR file via a command line download tool, e.g. wget
:
wget https://github.com/synthetichealth/synthea/releases/download/v3.0.0/synthea-with-dependencies.jar
In this example, we will be generating a small, synthetic dataset containing asthma patients and patients without asthma. We will then perform a filter step so that only a mini dataset that is enriched for asthma is uploaded to the FHIR server.
First, create the raw output directory that synthea will write to:
mkdir -p output/synthea/raw
Next, run synthea using the config file in this repo to generate the random patient dataset, exporting it as both bulk FHIR and CSV data:
java -jar synthea-with-dependencies.jar \
-p 1000 \
-c synthea/config/synthea.properties \
--exporter.baseDirectory "./output/synthea/raw/"
The output data will be in the ./output/synthea/raw
directory.
A custom python script filter.py
has been included in the repo. This script produces a mini dataset of 100+ patients that are enriched for asthma incidences. The resulting mini dataset will contain all patients with asthma from the original synthetic dataset, plus 100 patients without asthma.
The enriched mini-dataset uploaded to the AWS instance has been included in the repository at output/synthea/filtered/fhir
, so you do not need to run the filter step. However, the filter step can be run with:
python filter.py
assuming that Python 3 has been installed on your system.
Now, we will load the asthma enriched dataset into the local FHIR server we started in step 1. There is a custom python script upload.py
that will handle the upload of the relevant records. The upload script leverages the FHIR REST API documented at http://localhost:8080/fhir/swagger-ui/index.html. Specifcally, the script makes HTTP requests to the PUT /fhir/{model}/{id}
REST endpoint for each model and record in the dataset.
The upload step can simply be run be executing:
python upload.py
Now that the data has been ingested, we can now explore it by making HTTP requests to the FHIR REST API. Using your preferred HTTP client tool (such as Postman), you can now browse instances of the FHIR models you uploaded, including:
Patient
Condition
Encounter
Observation
DiagnosticReport
DocumentReference
Immunization
Procedure
Practitioner
PractitionerRole
Organization
Location
The URL template for viewing these instances is:
GET http://{serverBaseUrl}/fhir/{model}/{id}
For example, to view a Patient
with id 02dade42-9887-12c3-979e-5df8f35319f7
, you would make a request to
GET http://localhost:8080/fhir/Patient/02dade42-9887-12c3-979e-5df8f35319f7
for the local FHIR server, or
GET https://cohort.ga4gh-demo.org/fhir/Patient/02dade42-9887-12c3-979e-5df8f35319f7
for the web-accessible server
A full list of patient IDs is available here
This is a simple example on how to use CQL to identify whether a patient has a certain phenotype.
Let's say we want to know if a patient was diagnosed with asthma as defined by the SNOMED CT code 195967001 (Asthma (disorder)). In CQL, that would look something like this:
library "AsthmaPhenotype" version '1.0.0'
using FHIR version '4.0.1'
include FHIRHelpers version '4.0.1'
codesystem "SNOMED": 'http://snomed.info/sct'
code "Asthma": '195967001' from "SNOMED"
context Patient
define "Asthma Diagnosis":
[Condition: "Asthma"]
define "Has Asthma Diagnosis":
exists("Asthma Diagnosis")
On the first line we name our CQL library, a library in CQL is the high-level container in which the logic is stored. We then say that we use FHIR as our data model and specify the version (R4, 4.0.1). FHIRHelpers is another CQL library that is added as a dependency, this library enables the conversion between FHIR data types and CQL primatives, and enables the use of FHIRPath (like XPath, but for FHIR). Then, we define a code system (SNOMED CT in this case) and a code (the SNOMED asthma code as mentioned before). The next line 'context Patient' defines that all statements that follow this line have a FHIR Patient as their overall context. The statement "Asthma Diagnosis" returns any Condition resources that contain the specified SNOMED code for a given Patient. Lastly, "Has Asthma Diagnosis" returns either true or false, depending on if any Conditions with the specified code were found.
This step is optional but allows for storing any CQL library embedded in a FHIR Library on a FHIR server, and evaluate the statements using FHIR's Clinical Reasoning module. Packaging the CQL can be done using the CQF Tooling.
The Library with the embedded 'AsthmaPhenotype' CQL is included in the /cql
folder here.
This is where we actually run our CQL statements for a specific Patient to get the result (asthma/no asthma). To evaluate a FHIR Library with an embedded CQL quality measure (like this example), you need a FHIR server that supports FHIR's Clinical Reasoning module. Here we can use the CQF Ruler which is a HAPI FHIR server with some additional plugins. It can be easily run from a Docker container and the instructions above can be used to load synthetic Synthea data onto the server.
Once the data and the Library are loaded onto the server you can run $evaluate
GET http://localhost:8080/fhir/Library/AsthmaPhenotype/$evaluate?subject=Patient/54a0dc8e-5c3c-e294-6619-486afc4f9444
specifying Patient with ID 54a0dc8e-5c3c-e294-6619-486afc4f9444
as subject (an asthma patient according to our definition).
The server will then return the output of the CQL evaluation as a Parameters resource. For this example, the last part of the output should look like:
{
"name": "Has Asthma Diagnosis",
"valueBoolean": true
}
Which confirms that Patient 54a0dc8e-5c3c-e294-6619-486afc4f9444
is indeed an asthma patient (has a Condition resource with SNOMED CT code 195967001).
Stashed changes