Summary

We would like to be able to generate a vocabulary based on a Scan Report, to upload it to Athena. This is an initial proposal, and welcome comments based on it.

Supporting Documentation:

Documentation (type 4)
The template to generate

A potential roadmap for versions of this:

Version 0.1

Preprocessing step
Local code that generates vocabularies "tables" as in, what you download from Athena. This is mostly focussed on Concepts table initially.
Can we reliably detect a question? For example Has smoked in the last year? Yes | No is a sample data point in the column header, that would need to parsed to Smoked in the last year, with options of yes | no
This data is captured in an EDC, can could take potentially any format/content really.

Version 1:

A small form that enables a user to export vocabularies from a Scan Report.
Form asks for the vocabulary_id to effectively name the vocabulary, the vocabulary is generated and returned to the user.
Carrot will build a vocabulary, similar to how it currently exports the mapping_rules to .csv
This all depends on mapping rules existing.
This approach leaves the user to complete the concept_name column of the template.

Version 2:

As above.
User will supply a source_dictionary in the form, this should include at least 2 columns, a mapping from the concept_code to the concept_name, which is effectively a description of the vocabulary term.
Carrot uses this to populate the concept_name of the template
We can either define a template for this form, or allow a user to upload a spreadsheet, and let them select which columns contain the code/name.

Caveats

We will need to be clear to the user, that there are limitations to this export and it depends on:

The Scan Report minimum cell count (truncation) of White Rabbit. Data might have been lost, so therefore there cannot be a complete vocabulary of it.
The need for QA checking of the export

Tasks

[ ] Backend - export vocabulary service (I anticipate this being an Azure function)
[ ] Frontend - a form to support exporting the vocabulary.

Acceptance Criteria

[ ] A way for the user to export vocabulary information from a Scan Report.

Health-Informatics-UoN / Carrot-Mapper

Generate Vocabularies #752