ACED-IDP / submission

Submit FHIR based data into a Gen3 Commons
1 stars 0 forks source link

Dataframer/Pivot with Facet Mangement #23

Open bwalsh opened 2 months ago

bwalsh commented 2 months ago

Use Cases:

Example

image

References

Goals/Forces/Motivation :

The primary reason for using the pivot function is to reshape the data. It transforms data from long to wide format, which helps when comparing different variables more effectively. This reshaping is fundamental in preparing datasets for analysis or visualization as it allows for a more structured and readable form of data representation.

Data pivoting is the process of transforming data from a long format (rows) to a wide format (columns), typically to make it more understandable or suitable for analysis. Common scenarios for data pivoting include summarizing data, creating cross-tabulations, or presenting data in a more structured format for reporting purposes.

See R's tidyverse

Tidy data refers to ‘rectangular’ data. These are the data we typically see in spreadsheet software like Googlesheets, Microsoft Excel, or in a relational database like MySQL, PostgreSQL, or Microsoft Access, The three principles for tidy data are:

  • Variables make up the columns
  • Observations (or cases) go in the rows
  • Values are in cells Put them together, and these three statements make up the contents in a tidy data frame or tibble. While these principles might seem obvious at first, many of the data arrangements we encounter in real life don’t adhere to this guidance.

In context of FHIR, the Observation is the "tidy data" aka “indexed”, and has a defined, fixed schema. The dataframe is the "wide data" or "Cartesian" data see ggplot2 it's schema is not fixed as "variables" are defined by Observation.code.

image image

Workflow

Data Frame Creation:

Create an initial pandas DataFrame with columns for each unique observation code extracted from the observations. Each row represents a set of observations with its associated attributes (subject, encounter, value, effectiveDateTime, etc.) for a given subject, specimen, focus at a given time. Use observation codes as columns to ensure each code has its dedicated column in the DataFrame.

De normalization by Observation.subject:

Facet Management Category and Coding Extraction:

While creating the "dataframe" we also need to discover and maintain a hierarchy of facets.

explorerConfig:
  {{ for all unique subject.resourceType }}
  - tabTitle: <resourceType>  # e.g., Patient, Specimen, etc.
    charts: []  # manually add charts
    filters:
      {{for all categories}}  # e.g., 'Vital Signs', 'Laboratory', etc.
        tabs:
          - title: category
            fields: <codes from category>  # e.g., 'heart_rate', 'blood_pressure', etc.
                       <scalars and extensions>
    table:
        enabled: true
        detailsConfig: # manually add detailsConfig
        fields:
          # all flattened references
          # all codes

Notes: Out of scope or static elements in explorerConfig.

Example: from Prostate_Microenvironments

image

Summary view of dataframe e.g. patient-centric

image

Changes to guppy config

# Guppy configuration
guppy:
  enabled: true
  dbRestore: false
  indices:
  - index: observation
    type: observation
  - index: file
    type: file
# added to support facet management
  - index: explorer_config
    type: explorerConfig

  configIndex: gen3.aced.io_array-config

End to End

image

Guppy PR link: https://github.com/uc-cdis/guppy/pull/273

FF issue: https://github.com/ACED-IDP/gen3-frontend-framework/issues/12

Existing work

 g3t meta dataframe --help
Usage: g3t meta dataframe [OPTIONS] [DIRECTORY_PATH] [OUTPUT_PATH]

  Render a metadata dataframe.

  directory_path: The directory path to the metadata.
  output_path: The output path for the dataframe. default [meta.csv]

Options:
  --dtale                         Open the graph in a browser using the dtale
                                  package for interactive data exploration.
  --data_type [Patient|Specimen|Observation|DocumentReference]
                                  Create a data frame for a specific data
                                  type.  [required]
  --debug

E.G.

g3t meta dataframe --data_type Observation META/ --dtale

image

Discussion points

bwalsh commented 1 month ago

A worked example

Consider this bundle (see fsh editor for shorthand.)

image
Alias: $sct = http://snomed.info/sct
Alias: $condition-category = http://terminology.hl7.org/CodeSystem/condition-category
Alias: $observation-category = http://terminology.hl7.org/CodeSystem/observation-category
Alias: $loinc = https://loinc.org
Alias: $mylab = http://mylab.org

Instance: undefined
InstanceOf: Bundle
Usage: #example
* type = #bundle
* entry[0].resource = example
* entry[+].resource = example-specimen
* entry[+].resource = example-cancer
* entry[+].resource = example-common-cold
* entry[+].resource = example-fever
* entry[+].resource = example-gleason-score
* entry[+].resource = example-favorite-color

Instance: example
InstanceOf: Patient
Usage: #inline
* birthSex.coding.system = "http://terminology.hl7.org/CodeSystem/v3-AdministrativeGender"
* birthSex.coding.code = "M"

Instance: example-specimen
InstanceOf: Specimen
Usage: #inline
* subject = Reference(example)
* type = $sct#122555 "Biopsy"
* collection.bodySite.coding.system = "http://snomed.info/sct"
* collection.bodySite.coding.code = "122456"
* collection.bodySite.coding.display = "Prostate"
* processing.method = $sct#" 787376009" "Preparation of formalin fixed paraffin embedded tissue specimen"

Instance: example-cancer
InstanceOf: Condition
Usage: #inline
* subject = Reference(example)
* category = $condition-category#encounter-diagnosis
* code = $sct#123456 "Cancer"
* onsetAge = 600 'm' "months"
* evidence.reference = "Observation/example-gleason-score"

Instance: example-common-cold
InstanceOf: Condition
Usage: #inline
* subject = Reference(example)
* category = $condition-category#encounter-diagnosis
* code = $sct#7890 "Common Cold"
* onsetAge = 601 'm' "months"
* evidence.reference = "Observation/example-fever"

Instance: example-fever
InstanceOf: Observation
Usage: #inline
* subject = Reference(example)
* focus = Reference(example)
* category = $observation-category#vital-signs
* code = $loinc#45701-0 "Fever"
* valueBoolean = true
* effectiveAge.value = 601
* effectiveAge.code = "m"
* effectiveAge.system = "http://unitsofmeasure.org"
* effectiveAge.unit = "months"

Instance: example-gleason-score
InstanceOf: Observation
Usage: #inline
* subject = Reference(example)
* focus = Reference(example-specimen)
* category = $observation-category#laboratory
* code = $loinc#94734-1 "Gleason score"
* valueCodeableConcept = $loinc#LA30796-9 "ISUP Grade (Grade Group) 3 (Gleason score 4+3=7)"
* effectiveAge.value = 600
* effectiveAge.code = "m"
* effectiveAge.system = "http://unitsofmeasure.org"
* effectiveAge.unit = "months"

Instance: example-favorite-color
InstanceOf: Observation
Usage: #inline
* subject = Reference(example)
* focus = Reference(example)
* category = $observation-category#survey
* code = $mylab#favorite-color "Favorite color"
* valueString = "Blue"

Resulting dataframe

(Note that onsetAge, a temporal field was used prompt a new line)

patient birthSex favorite_color condition_code onsetAge gleason_score fever specimen specimen_type specimen_collection_body_site specimen_processing_method
example M Blue Cancer 600 ISUP Grade (Grade Group) 3 (Gleason score 4+3=7) example-specimen Biopsy Prostate Preparation of formalin fixed paraffin embedded tissue specimen
example M Blue Common Cold 601 TRUE

Resulting Facet Hierarchy

(source resource included for completeness)

category facet resource
patient patient Patient
patient birthSex Patient
survey favorite_color Observation
condition condition_code Condition
condition onsetAge Condition
laboratory gleason_score Observation
vital-signs fever Observation
specimen specimen Specimen
specimen specimen_type Specimen
specimen specimen_collection_body_site Specimen
specimen specimen_processing_method Specimen
bwalsh commented 1 month ago

Additional gen3 FEF documentation for explorer config. https://github.com/uc-cdis/gen3-frontend-framework/blob/develop/docs/Configuration/Explorer.md#selection-facet

bwalsh commented 2 weeks ago

Prototype here: https://github.com/ACED-IDP/gen3_util/blob/763b1f1a1f3aa3e35b60be910662983e99c6aef9/tests/unit/dataframer/test_dataframer.py#L209