Gilead-BioStats / gsm

Good Statistical Monitoring R Package
https://gilead-biostats.github.io/gsm/
Apache License 2.0
39 stars 9 forks source link

QC: Remove `required` field from specs and `checkSpec()`/`CombineSpec()`/`Ingest()` #1936

Open lauramaxwell opened 2 weeks ago

lauramaxwell commented 2 weeks ago

QC Details

Per discussion with a2ai team, we are to remove the required field from all specs to avoid confusion

Additional Comments

lauramaxwell commented 2 weeks ago

@jwildfire @dpastoor - It is unclear to me if we are removing this and just treating all specified columns as required, or if we are also removing the columns currently marked as optional in the mapping yamls? This has bigger impact on gsm.endpoints and gsm.app than gsm.

lauramaxwell commented 1 week ago

keep all columns, whether or not they are optional. If a column doesn't exist for a particular study, it will just be deleted in the study config

jonthegeek commented 1 week ago

I feel like this will make it difficult to supply any sort of workflow tooling in gsm.app that's usable by anybody other than Gilead. Are we OK with that?

jwildfire commented 1 week ago

I feel like this will make it difficult to supply any sort of workflow tooling in gsm.app that's usable by anybody other than Gilead. Are we OK with that?

@jonthegeek Can you provide some details around the generalizability issues you see with removing the required field?

Our default workflows were never designed to be applicable for every use case, but our modules (including gsm.app) should definitely still be widely usable with many different data sources via standalone function calls or as a set of modules using gsm.template. I think it's pretty likely there will be organization specific forks of gsm.template where support for different data sources/requirements are stored.

jonthegeek commented 1 week ago

For most tables in gsm.app, we have certain fields that the app uses directly, and workflows expect those fields to be there. But we also have other fields that are nice to display if the user has those fields (or equivalent fields that they translate via an ingestion).

For example, this is my current workflow to get Adverse Events domain data:

meta:
  Type: User
  ID: AdverseEvents
  Description: Adverse Event User-Facing Data
  Priority: 1
spec:
  Raw_AE:
    SubjectID:
      required: true
      type: character
      source_col: subjid
    serious:
      required: true
      type: character
      source_col: aeser
    start_date:
      required: false
      type: Date
      source_col: aest_dt
    end_date:
      required: false
      type: Date
      source_col: aeen_dt
    preferred_term:
      required: false
      type: character
      source_col: mdrpt_nsv
    system_organ_class:
      required: false
      type: character
      source_col: mdrsoc_nsv
    toxicity_grade:
      required: false
      type: character
      source_col: aetoxgr
steps:
  - output: User_AE
    name: =
    params:
      lhs: User_AE
      rhs: Raw_AE

If they have all that other data for us, cool, we'll display it (and make it available downstream in deeper dives). But the app shouldn't break if that data isn't available.

All that said... I'm not 100% certain yet that these workflows are relevant outside of generating the sample data. Since we require users to supply functions to load domain data anyway, we might end up not exporting any gsm.app-specific workflows. But I could imagine them being useful to determine whether deeper and deeper dives are available to a given user.