CDCgov / prime-simplereport

SimpleReport is a fast, free, and easy way for COVID-19 testing facilities to report results to public health departments.
https://simplereport.gov
Creative Commons Zero v1.0 Universal
57 stars 55 forks source link

Export test events to .csv #159

Closed timbest-cdc closed 3 years ago

timbest-cdc commented 3 years ago

TODO [x] add a mechanism to export test events

Example https://gist.github.com/timothybest-usds/11fba7410952eafab50d35bc7bcfb4bf

Open Questions/ TODO

aliciabeckett-gov commented 3 years ago

Should be a full dump of MVP required fields before they are translated by the data hub.

expected headers.xlsx

jimduff-usds commented 3 years ago

OK, I owe you a few more answers, but here's a first pass.

Formats:

Whew!!

jlusds commented 3 years ago

Here's my takeaway but would want a second opinion.

Background Before POC testing, an Ordering Facility (a school, nursing facility) would often order testing but send it out to a different testing lab (ABC Laboratories) to do the actual testing. The ordering facility might not have a CLIA number.

With POC testing, Ordering Facilities are now doing the testing themselves, in house. To do this, they apply for a CLIA Certificate of Waiver. They now become eligible to be both the Ordering Facility and Testing Facility. When they do POC testing, they are both.

Proposal In a Point of Care Antigen world, which is our MVP, Ordering Facility Name = Testing Facility Name. We should list the same name in both fields. However, if Nursing Home A wants to do a send out PCR test to a different testing facility, then a given test would have a different Ordering Facility Name and Testing Facility Name. Testing_lab_ID I think can be the CLIA number. In case this is sensitive, this is a public number.

aliciabeckett-gov commented 3 years ago

I'll check in with our pilot site and see what kind of nasal swab they are using. We can hard code it until we build out the ability to add it (hopefully that comes soon!!)

aliciabeckett-gov commented 3 years ago

@jimduff-usds - what are your thoughts on having the Data hub do the work to translate the format from SimpleReport into the expected format by AZDHS?

jimduff-usds commented 3 years ago

@jimduff-usds - what are your thoughts on having the Data hub do the work to translate the format from SimpleReport into the expected format by AZDHS?

I think the answer will vary on the fields, for example if the change is trivial for @TimBest , then it would be good to do it in SimpleReport. But I realize we're all running short of time for the initial release, so we can take shortcuts now and try to patch later. Generally, its best to generate the data "correctly" as early as possible, but we might need to do otherwise to be expedient.

jimduff-usds commented 3 years ago

@timothybest-usds @RickHawesUSDS @jlusds @aliciabeckett-gov

A couple items for our ongoing conversation on the "contract" between SimpleReport and the Hub.

Two changes to the SimpleReport Schema:

  1. I changed columnn named 'Testing_lab_ID' to 'Testing_lab_CLIA' to make it more accurate. The Testing_lab_CLIA value for Via Elegante, Tucson Mountains is 03D2159846.

  2. I am adding a Patient_lookup_ID column, immediately after Patient_ID, per discussion with Tim.

RickHawesUSDS commented 3 years ago

@jimduff-usds Just on the issue of where to copy lab fields to facility fields, the mapper function in hub the schema is intended to handle this. I was about to hook this up when the Azure stuff hit. Will have it finished this weekend.

timbest-cdc commented 3 years ago

@jimduff-usds and @RickHawesUSDS I have updated the csv example with the latest format. Let me know if anything needs to be changed.

Note the columns are currently sorted alphabetically. @jimduff-usds your comment seems to imply that properties are implied from column order. Is this true? If so is example-simplereport-file-11-20-2020.csv in the expected order?

jimduff-usds commented 3 years ago

Thanks @TimBest , I'll take a look at your example, and I'll adjust what we're expecting as much as possible to try to match that. So, ball's in my court, and I may change our expected column ordering to match yours.

(Yes, at the moment, we expect both the exact column ordering, and exact the column header names to accept a file as validated.)

benwarfield-usds commented 3 years ago

at the moment, we expect both the exact column ordering

What's the benefit of doing that? Seems like a green-M&M situation at best, and unnecessarily brittle at worst.

jimduff-usds commented 3 years ago

at the moment, we expect both the exact column ordering

What's the benefit of doing that? Seems like a green-M&M situation at best, and unnecessarily brittle at worst.

Its just how its working now. I'd like us to allow columns in arbitrary order, but its not built yet.

jimduff-usds commented 3 years ago

Tim, We're close. I have comments on these fields as the data appears in your gist example:

[ ] Ordered_test_code. MISSING LOINC. See "covid-19/order" above, for codes.

[ ] Patient_DOB. In your example, you have '11850514'. Expecting YYYYmmDD. Maybe this is just a typo?

[ ] Patient_race. Needs to be one of the following codes, eg, 2028-9 instead of 'asian', in your example. Sorry, missed this one earlier.

    - display: N      # Native
      code: 1002-5
    - display: A      # Asian
      code: 2028-9
    - display: B      # Black
      code: 2054-5
    - display: P      # Pacific Islander
      code: 2076-8
    - display: W      # White
      code: 2106-3
    - display: O      # Other
      code: 2131-1
    - display: O
      code: UNK
    - display: O      #  Asked, but unknown
      code: ASKU

[ ] Patient_street2. I just noticed the typo on our side -- please change the column name to 'Patient_street_2', so we're consistent.

[ ] Specimen_collection_date_time. Your example has '41:55.2'. This needs a full date. At least for AZ, the time portion is optional.

[ ] Specimen_source_site_code. You have 'Forearm' at the moment. AZ said they can infer the value from the Specimen_type_code, so I suggest you leave this blank or "N/A" for right now. We may need to collect a real value for this in the future; TBD.   [ ] Specimen_type_code. Missing in the example. Must be SNOMED code. See "covid-19/specimen_type" above for expected value.

[ ] Testing_lab_street2. Same as above, our typo. Please change title to 'Testing_lab_street_2'.

[ ] And lastly, I think I do want to ask if you can order the fields the same as ours. Your current order appears to be random (?); I think there's value in having reasonably ordered data, for debugging, etc. Let me know if you disagree. Order should be as the elements are ordered here: https://github.com/CDCgov/prime-data-hub/blob/master/prime-router/metadata/schemas/PrimeDataInput/pdi-covid-19.schema

RickHawesUSDS commented 3 years ago

@jimduff-usds Will have a PR that allows column headers to come in any order.

RickHawesUSDS commented 3 years ago

Also, @TimBest what happened to the device-id field? That's kind of important isn't it?

jimduff-usds commented 3 years ago

@TimBest , per the LIVD spreadsheet, it looks like the Ordered_test_code for Binax is 94558-4 Abbott | BinaxNOW COVID-19 Ag Card | SARS-CoV-2 nucleocapsid protein antigen

And the only associated Specimen_type_code for Binax is 44529700 - nasal swabs (Swab of internal nose)

However, almost every other device has multiple Specimen_type_code. For example Abbott's ID NOW test has 4 different kinds of specimens that can be taken.

jimduff-usds commented 3 years ago

Also, @TimBest what happened to the device-id field? That's kind of important isn't it?

@RickHawesUSDS - we have it. Its the "Ordered_test_code" field in our data. Probably we need a better name....

RickHawesUSDS commented 3 years ago

So, this is complicated and we are new to the space, but the way I read the LIVD table, there are many device types and manufactures for a given "Ordered_test_code". That is many devices can run a "SARS-CoV-2 (COVID-19) Ag [Presence] in Respiratory specimen by Rapid immunoassay". So the field is not the same as saying "Abbott", "BinaxNow COVID-19 AG Card". If you agree, I'd propose that you go back to providing the "BinaxNow COVID-19 AG Card" device-id field and let the hub translate it to a particular LONIC Ordered_test_code.

jimduff-usds commented 3 years ago

I've added Device_ID to the expected data from SimpleReport; its in a PR now.

I'm realizing this Issue has gotten quite hard to understand; I'm going to start a (hopefully!) organized data dictionary somewhere, in an attempt to save @TimBest 's sanity.

timbest-cdc commented 3 years ago

per todays discussion this is the current state of the simple report export https://gist.github.com/timothybest-usds/11fba7410952eafab50d35bc7bcfb4bf It should include everything up to https://github.com/CDCgov/prime-data-input-client/issues/159#issuecomment-736068023

jimduff-usds commented 3 years ago

@TimBest , the Data Dictionary is here:
https://github.com/CDCgov/prime-data-hub/blob/master/prime-router/docs/simplereport-data-dictionary.md

RickHawesUSDS commented 3 years ago

@TimBest I just tried this sample file (with Device_ID and a single race value) and it worked on the latest data hub code. https://gist.github.com/RickHawesUSDS/f082131bb6b1d04cb58cd861de3d60cc

timbest-cdc commented 3 years ago

Going to mark this as complete since we successfully tested the pipeline last night

the issue with race=null will be resolved in https://github.com/CDCgov/prime-data-input-client/pull/211

Future integration work can be found: https://github.com/CDCgov/prime-central/issues/160