Semantic coding of questions in Flow Results

markboots commented 3 years ago

Recent discussions with UNICEF (@ewheeler ) and the mHero team (@citizenrich) have reinforced the value of both Flow Specs supporting a standard coding system for the meaning of flow interactions/questions.

An example challenge is: "We have 300 surveys that have been done over 15 countries, and we need to know: which of these questions is the Gender question, and which responses mean female? ("Female, female, woman, women, Femme, femme, F, etc...")

Flow Spec aimed to address this with the semantic_label field in Blocks, which was intended for standard coding of the semantic meaning, according to a code system (whether that is a FHIR ValueSet, an industry standard coding system such as SNOMED CT, an organization-defined terminology service, etc.). The intention was not to adopt a specific coding system, but to provide a recommended place in the schemas for organizations using a coding system.

Proposal: We should make the semantic_label from Flow Spec also available in Flow Results, given the importance of Flow Results in sharing data across systems with the semantic meaning attached.

Questions

Is a single semantic_label string field on Questions sufficient in Flow Results? e.g.

"questions":{
    "ae54d3":{
      "type":"multiple_choice",
      "label":"What is your gender?",
      "semantic_label": "SNOMEDCT::365873007"
      "type_options":{
        "choices":[
          "male",
          "female",
          "unspecified",
          "non-binary"
        ]
      }
    }

Is there a need for more than one semantic tag?

semantic_labels: ["SNOMEDCT::365873007", "MySystem::Gender"]

Is it helpful to explicitly specify a coding system, e.g.

semantic_label: "365873007"
semantic_code_system": "SNOMEDCT"

Should we code the type_options for "Select One" and "Select Many" question types?

rudigiesler commented 3 years ago

To try and wrap my head around this, here are some of my definitions of things:

keywords - Female, female, woman, women, Femme, femme, F, etc. These are the options that a user could've typed in to get to a specific option on a Select. This could also be an infinite list, eg. if you're using NLU to determine which option the user wants

values - female. This is the system value that is one of the choices

labels - Female, Femme, etc. This is what the user sees, and is dependant on eg. language.

Then what I'm unsure of is, for flow results, are we storing the keyword, ie the text the user sent? I would think not, since there are potentially infinite of those, but we have a finite amount in the type option choices, but the issue seems to describe this as the problem that we're trying to solve.

If we're storing a value, so whether the user typed female or femme, we would store the result as female. In that case, the application is responsible for mapping to one of the type option choices before storing the flow result, and we don't have the above issue, since all the different flows are storing the standard value of female.

Another useful use for these semantic tags could be if there's a field type that is not in the spec, that we want to be able to store. eg. we want to store an ISO639-3 language code. Then we can use a text field, and set a semantic coding of iso639-3, so that we know how to interpret this text field.

markboots commented 2 years ago

Hi @rudigiesler , sorry for the late reply on this one. I agree that Flow Results would generally be storing a value , not a label or a keyword in your terminology above. Even though responses could be a standard "system value", (e.g.: "female" or "male"), what remains is: How do we identify that this question in this flow in this account, as well as that question in that flow in that account, are both "What is your gender?" questions.

This is where a data taxonomy comes into play; the semantic_label codifies the meaning of the question in a taxonomy. This is only useful if the user organization(s) have a standard coding system to refer to, e.g. FHIR ValueSets, SNOMEDCT in health, or an in-house-developed data taxonomy. If they have such a system, adding semantic_label provides a place to indicate the coding of each question.

FLOIP / flow-results

Semantic coding of questions in Flow Results #37