Closed timbest-cdc closed 3 years ago
Should be a full dump of MVP required fields before they are translated by the data hub.
OK, I owe you a few more answers, but here's a first pass.
Formats:
Patient_DOB
. We currently expect yyyymmdd, eg, 19850531
Patient_gender
. We expect single uppercase letter , eg, F, A (ambiguous), O (other), N (not applicable), M
Patient_ethnicity
. We expect single uppercase letter, eg, U (unknown), N (not hisp), H (hisp)
Test_result_code
. We expect numeric codes as follows:
- code: 260373001
display: Detected
- code: 260415000
display: Not detected
- code: 895231008
display: Not detected in pooled specimen
- code: 462371000124108
display: Detected in pooled specimen
- code: 419984006
display: Inconclusive
Employed_in_healthcare
. We expect Y, N, or UNK. (Note: It looks like AZ PHD does not current expect this field)
Resident_congregate_setting
We expect Y, Y, UNK. (Note 1: This is correct as far as it goes. The HHS direction say, if Y, then you give a code for type of facility, so we'd need a Resident_Congregate_Setting_Type column. Sigh) (Note 2: It looks like AZ PHD does not current expect this field)
[ ] What is Specimen_source_site_code
? Answer: We don't have a value set for this yet. AZ PHD example for this is "Forearm".
[ ] What is Specimen_type_code. Answer: as follows...
name: covid-19/specimen_type system: SNOMED_CT referenceUrl: https://www.hhs.gov/sites/default/files/hhs-guidance-implementation.pdf values:
[ ] split ordering provider name into first_name, last_name. Answer: Yes, AZ PHD is expecting these as two separate fields.
[ ] what is Testing_lab_ID CLIA Number? our database unique identifier? Answer: Per @jlusds comment below, for our initial MVP, this CLIA number must be registered and supplied by the place doing the testing. (Not an id from our database).
[ ] we have one organization should its information be in testing_lab and/or ordering_facility. Answer: Per @jlusds comment below, for our MVP testing_lab == ordering_facility. However, generally, these might be two different places.
Testing_lab_name Ordering_facility_name`
_state
_street
_street2
_zip_code
_county Note the example has a Ordering_facility_county but not a testing_lab_county
what is Ordered_test_code? Answer:
name: covid-19/order system: LOINC reference: Incomplete - Supports BD Veritor, Quidell Sofia, and Abbott ID Now referenceUrl: https://www.cdc.gov/csels/dls/documents/livd_test_code_mapping/LIVD-SARS-CoV-2-2020-10-21.xlsx values:
[ ] what is Specimen_source_site_code? Answer: Answered this above.
[ ] what is Specimen_type_code? Answer: Answered this above.
[ ] Instrument_ID is the device IONIC code? Answer: Not sure. I need to get back to you. It looks like we expect a numeric ID for this right now. No enumerated set of values
[ ] What is the difference between Test_date and Date_result_released. Answer: AZ PHD expects two dates: the "Collection date" and the "Result Date". I presume test_date is the date the sample was collected.
Whew!!
Here's my takeaway but would want a second opinion.
Background Before POC testing, an Ordering Facility (a school, nursing facility) would often order testing but send it out to a different testing lab (ABC Laboratories) to do the actual testing. The ordering facility might not have a CLIA number.
With POC testing, Ordering Facilities are now doing the testing themselves, in house. To do this, they apply for a CLIA Certificate of Waiver. They now become eligible to be both the Ordering Facility and Testing Facility. When they do POC testing, they are both.
Proposal In a Point of Care Antigen world, which is our MVP, Ordering Facility Name = Testing Facility Name. We should list the same name in both fields. However, if Nursing Home A wants to do a send out PCR test to a different testing facility, then a given test would have a different Ordering Facility Name and Testing Facility Name. Testing_lab_ID I think can be the CLIA number. In case this is sensitive, this is a public number.
I'll check in with our pilot site and see what kind of nasal swab they are using. We can hard code it until we build out the ability to add it (hopefully that comes soon!!)
@jimduff-usds - what are your thoughts on having the Data hub do the work to translate the format from SimpleReport into the expected format by AZDHS?
@jimduff-usds - what are your thoughts on having the Data hub do the work to translate the format from SimpleReport into the expected format by AZDHS?
I think the answer will vary on the fields, for example if the change is trivial for @TimBest , then it would be good to do it in SimpleReport. But I realize we're all running short of time for the initial release, so we can take shortcuts now and try to patch later. Generally, its best to generate the data "correctly" as early as possible, but we might need to do otherwise to be expedient.
@timothybest-usds @RickHawesUSDS @jlusds @aliciabeckett-gov
A couple items for our ongoing conversation on the "contract" between SimpleReport and the Hub.
Two changes to the SimpleReport Schema:
I changed columnn named 'Testing_lab_ID' to 'Testing_lab_CLIA' to make it more accurate. The Testing_lab_CLIA value for Via Elegante, Tucson Mountains is 03D2159846.
I am adding a Patient_lookup_ID
column, immediately after Patient_ID, per discussion with Tim.
@jimduff-usds Just on the issue of where to copy lab fields to facility fields, the mapper
function in hub the schema is intended to handle this. I was about to hook this up when the Azure stuff hit. Will have it finished this weekend.
@jimduff-usds and @RickHawesUSDS I have updated the csv example with the latest format. Let me know if anything needs to be changed.
Note the columns are currently sorted alphabetically. @jimduff-usds your comment seems to imply that properties are implied from column order. Is this true? If so is example-simplereport-file-11-20-2020.csv in the expected order?
Thanks @TimBest , I'll take a look at your example, and I'll adjust what we're expecting as much as possible to try to match that. So, ball's in my court, and I may change our expected column ordering to match yours.
(Yes, at the moment, we expect both the exact column ordering, and exact the column header names to accept a file as validated.)
at the moment, we expect both the exact column ordering
What's the benefit of doing that? Seems like a green-M&M situation at best, and unnecessarily brittle at worst.
at the moment, we expect both the exact column ordering
What's the benefit of doing that? Seems like a green-M&M situation at best, and unnecessarily brittle at worst.
Its just how its working now. I'd like us to allow columns in arbitrary order, but its not built yet.
Tim, We're close. I have comments on these fields as the data appears in your gist example:
[ ] Ordered_test_code. MISSING LOINC. See "covid-19/order" above, for codes.
[ ] Patient_DOB. In your example, you have '11850514'. Expecting YYYYmmDD. Maybe this is just a typo?
[ ] Patient_race. Needs to be one of the following codes, eg, 2028-9 instead of 'asian', in your example. Sorry, missed this one earlier.
- display: N # Native
code: 1002-5
- display: A # Asian
code: 2028-9
- display: B # Black
code: 2054-5
- display: P # Pacific Islander
code: 2076-8
- display: W # White
code: 2106-3
- display: O # Other
code: 2131-1
- display: O
code: UNK
- display: O # Asked, but unknown
code: ASKU
[ ] Patient_street2. I just noticed the typo on our side -- please change the column name to 'Patient_street_2', so we're consistent.
[ ] Specimen_collection_date_time. Your example has '41:55.2'. This needs a full date. At least for AZ, the time portion is optional.
[ ] Specimen_source_site_code. You have 'Forearm' at the moment. AZ said they can infer the value from the Specimen_type_code, so I suggest you leave this blank or "N/A" for right now. We may need to collect a real value for this in the future; TBD. [ ] Specimen_type_code. Missing in the example. Must be SNOMED code. See "covid-19/specimen_type" above for expected value.
[ ] Testing_lab_street2. Same as above, our typo. Please change title to 'Testing_lab_street_2'.
[ ] And lastly, I think I do want to ask if you can order the fields the same as ours. Your current order appears to be random (?); I think there's value in having reasonably ordered data, for debugging, etc. Let me know if you disagree. Order should be as the elements are ordered here: https://github.com/CDCgov/prime-data-hub/blob/master/prime-router/metadata/schemas/PrimeDataInput/pdi-covid-19.schema
@jimduff-usds Will have a PR that allows column headers to come in any order.
Also, @TimBest what happened to the device-id field? That's kind of important isn't it?
@TimBest , per the LIVD spreadsheet, it looks like the Ordered_test_code for Binax is 94558-4
Abbott | BinaxNOW COVID-19 Ag Card | SARS-CoV-2 nucleocapsid protein antigen
And the only associated Specimen_type_code for Binax is 44529700
- nasal swabs (Swab of internal nose)
However, almost every other device has multiple Specimen_type_code. For example Abbott's ID NOW test has 4 different kinds of specimens that can be taken.
Also, @TimBest what happened to the device-id field? That's kind of important isn't it?
@RickHawesUSDS - we have it. Its the "Ordered_test_code" field in our data. Probably we need a better name....
So, this is complicated and we are new to the space, but the way I read the LIVD table, there are many device types and manufactures for a given "Ordered_test_code". That is many devices can run a "SARS-CoV-2 (COVID-19) Ag [Presence] in Respiratory specimen by Rapid immunoassay". So the field is not the same as saying "Abbott", "BinaxNow COVID-19 AG Card". If you agree, I'd propose that you go back to providing the "BinaxNow COVID-19 AG Card" device-id field and let the hub translate it to a particular LONIC Ordered_test_code.
I've added Device_ID to the expected data from SimpleReport; its in a PR now.
I'm realizing this Issue has gotten quite hard to understand; I'm going to start a (hopefully!) organized data dictionary somewhere, in an attempt to save @TimBest 's sanity.
per todays discussion this is the current state of the simple report export https://gist.github.com/timothybest-usds/11fba7410952eafab50d35bc7bcfb4bf It should include everything up to https://github.com/CDCgov/prime-data-input-client/issues/159#issuecomment-736068023
@TimBest , the Data Dictionary is here:
https://github.com/CDCgov/prime-data-hub/blob/master/prime-router/docs/simplereport-data-dictionary.md
@TimBest I just tried this sample file (with Device_ID and a single race value) and it worked on the latest data hub code. https://gist.github.com/RickHawesUSDS/f082131bb6b1d04cb58cd861de3d60cc
Going to mark this as complete since we successfully tested the pipeline last night
the issue with race=null
will be resolved in https://github.com/CDCgov/prime-data-input-client/pull/211
Future integration work can be found: https://github.com/CDCgov/prime-central/issues/160
TODO [x] add a mechanism to export test events
Example https://gist.github.com/timothybest-usds/11fba7410952eafab50d35bc7bcfb4bf
Open Questions/ TODO
Patient_DOB
Patient_gender
Patient_ethnicity
Test_result_code
Employed_in_healthcare
Resident_congregate_setting
Specimen_source_site_code
?Specimen_type_code
name
intofirst_name
,last_name
Testing_lab_ID
CLIA Number? our database unique identifier?testing_lab
and/orordering_facility
.Testing_lab_name
Ordering_facility_name`_state
_street
_street2
_zip_code
_county
Note the example has aOrdering_facility_county
but not atesting_lab_county
Ordered_test_code
?Specimen_source_site_code
?Specimen_type_code
?Instrument_ID
is the device IONIC code?Test_date
andDate_result_released