ExposuresProvider / icees-api-config

Other
0 stars 1 forks source link

Implement plan for creating ground truth YAML files, converting to Dhall, and deploying new ICEES instances #29

Open karafecho opened 3 years ago

karafecho commented 3 years ago

(1) QC Priya's identifiers YAML (fix errors, adjust mappings, update search terms in all_features and add identifiers to identifiers file). (2) Add SeptRelayFix identifiers and fix spelling error (https://github.com/NCATS-Tangerine/icees-api-config/pull/28/files) and add identifiers for methylprednisone below. (3) Check variable names in identifiers YAML against ICEES tables for all use cases. (4) Check ICEES COVID for correct dexamethasone; if present, register with ARS.

{
  "PUBCHEM.COMPOUND:124653": {
    "id": {
      "identifier": "PUBCHEM.COMPOUND:124653",
      "label": "6alpha-Methylprednisone"
    },
    "equivalent_identifiers": [
      {
        "identifier": "PUBCHEM.COMPOUND:124653",
        "label": "6alpha-Methylprednisone"
      },
      {
        "identifier": "UNII:621XR2W6OP",
        "label": "6.ALPHA.-METHYLPREDNISONE"
      },
      {
        "identifier": "DRUGBANK:DB12952"
      },
      {
        "identifier": "MESH:C050574",
        "label": "6-methylprednisone"
      },
      {
        "identifier": "CAS:91523-05-6"
      },
      {
        "identifier": "INCHIKEY:SVYCRJXQZUCUND-PQXSVQADSA-N"
      }
    ],
    "type": [
      "biolink:SmallMolecule",
      "biolink:MolecularEntity",
      "biolink:ChemicalEntity",
      "biolink:NamedThing",
      "biolink:Entity",
      "biolink:PhysicalEssence",
      "biolink:PhysicalEssenceOrOccurrent",
      "biolink:ChemicalOrDrugOrTreatment",
      "biolink:ChemicalEntityOrGeneOrGeneProduct",
      "biolink:ChemicalEntityOrProteinOrPolypeptide"
    ]
  },
  "PUBCHEM.COMPOUND:5877": {
    "id": {
      "identifier": "PUBCHEM.COMPOUND:5877",
      "label": "Methylprednisolone acetate"
    },
    "equivalent_identifiers": [
      {
        "identifier": "PUBCHEM.COMPOUND:5877",
        "label": "Methylprednisolone acetate"
      },
      {
        "identifier": "CHEMBL.COMPOUND:CHEMBL1364144",
        "label": "METHYLPREDNISOLONE ACETATE"
      },
      {
        "identifier": "UNII:43502P7F0P",
        "label": "METHYLPREDNISOLONE ACETATE"
      },
      {
        "identifier": "CHEBI:6889",
        "label": "methylprednisolone acetate"
      },
      {
        "identifier": "MESH:C000873",
        "label": "[OBSOLETE] methylprednisolone acetate"
      },
      {
        "identifier": "MESH:D000077555",
        "label": "Methylprednisolone Acetate"
      },
      {
        "identifier": "CAS:53-36-1"
      },
      {
        "identifier": "DrugCentral:1770",
        "label": "methylprednisolone acetate"
      },
      {
        "identifier": "KEGG.COMPOUND:C08179",
        "label": "Methylprednisolone acetate"
      },
      {
        "identifier": "INCHIKEY:PLBHSZGDDKCEHR-LFYFAGGJSA-N"
      }
    ],
    "type": [
      "biolink:SmallMolecule",
      "biolink:MolecularEntity",
      "biolink:ChemicalEntity",
      "biolink:NamedThing",
      "biolink:Entity",
      "biolink:PhysicalEssence",
      "biolink:PhysicalEssenceOrOccurrent",
      "biolink:ChemicalOrDrugOrTreatment",
      "biolink:ChemicalEntityOrGeneOrGeneProduct",
      "biolink:ChemicalEntityOrProteinOrPolypeptide"
    ]
  },
  "PUBCHEM.COMPOUND:6741": {
    "id": {
      "identifier": "PUBCHEM.COMPOUND:6741",
      "label": "Methylprednisolone"
    },
    "equivalent_identifiers": [
      {
        "identifier": "PUBCHEM.COMPOUND:6741",
        "label": "Methylprednisolone"
      },
      {
        "identifier": "CHEMBL.COMPOUND:CHEMBL650",
        "label": "METHYLPREDNISOLONE"
      },
      {
        "identifier": "UNII:X4W7ZR7023",
        "label": "METHYLPREDNISOLONE"
      },
      {
        "identifier": "CHEBI:6888",
        "label": "6alpha-methylprednisolone"
      },
      {
        "identifier": "DRUGBANK:DB00959"
      },
      {
        "identifier": "MESH:D008775",
        "label": "Methylprednisolone"
      },
      {
        "identifier": "CAS:83-43-2"
      },
      {
        "identifier": "DrugCentral:1768",
        "label": "methylprednisolone"
      },
      {
        "identifier": "GTOPDB:7088",
        "label": "methylprednisolone"
      },
      {
        "identifier": "HMDB:HMDB0015094",
        "label": "Methylprednisolone"
      },
      {
        "identifier": "INCHIKEY:VHRSUDSXCMQTMA-PJHHCJLFSA-N"
      }
    ],
    "type": [
      "biolink:SmallMolecule",
      "biolink:MolecularEntity",
      "biolink:ChemicalEntity",
      "biolink:NamedThing",
      "biolink:Entity",
      "biolink:PhysicalEssence",
      "biolink:PhysicalEssenceOrOccurrent",
      "biolink:ChemicalOrDrugOrTreatment",
      "biolink:ChemicalEntityOrGeneOrGeneProduct",
      "biolink:ChemicalEntityOrProteinOrPolypeptide"
    ]
  }
}
karafecho commented 3 years ago

(6) Change IN_ICEES to IN_UNC_Health, all config files.

karafecho commented 3 years ago

Update, 10.27.21:

Below is a quick update on our efforts to correct mapping issues with the ICEES YAML files and redeploy the ICEES instances. (1), (2), and (6) above have been completed.

  1. We now have two "ground truth" YAML files, all_features_version_04.yaml and identifiers_version_03.yaml. Note that I only QC'd the variables for the patient tables, not the visit tables, given that we will be using one set of variables for both tables after we fully migrate from a YAML-based to a Dhall-based config system.
    • I hid the visit tables (# visit) in the main files and deleted those variables in a second set of files; plan is to merge both sets of files with main

COMPLETE

  1. Note that while I am considering the files to be "ground truth", we likely will need to refine some of the mappings over time. The labs, for instance, are a bit inconsistent, in part due to the fact that Translator does not yet do a great job with labs.

NO ACTION REQUIRED

  1. We have not yet created a "ground truth" FHIR_mappings_version_01.yaml file. However, since this is needed only to map to FHIR data elements and generate new ICEES tables via FHIR PIT, this step is not high priority. 

ACTION: Hong is investigating non-Athena approaches to do this. After this file has been finalized, we can create a new set of Dhall files and hopefully consider the migration complete.

  1. We have several sets of ICEES+ integrated feature tables on ebcr0.edc.renci.org or (in the case of COVID) covid-db-dev.edc.renci.org. These need to be QC'd such that the headers in the data files align with the variable names in the two ground truth YAML files.

    • Complete for DILI
    • Complete for Asthma ACTION: Plan to be considered for COVID, which isn't really part of Translator, but is being called by Translator ARAs, so decision needed as to whether this is a priority
  2. Several variable values within the data files for DILI dataset need to be binned to match the variable enumeration.

    • I plan to complete this and upload to ebcr0.edc.renci.org COMPLETE
  3. Redeploy the ICEES asthma instance and ICEES DILI instance

  4. Deploy an initial ICEES PCD instance, using the initial set of tables currently on Rockfish

  5. Deploy a new ICEES APR2021 COVID instance after generating a new set of tables.

  6. Test adjustments to the default P value by way of direct ICEES query.

    • Consider whether this is needed before the Dec demo
  7. Adjust the formatting of edge attributes that are returned to users.

    • Align with COHD and Clinical Risk Provider's formatting
karafecho commented 3 years ago

Clean repo and add config file schema doc.

karafecho commented 3 years ago

FYI: I'll break this ticket down into individual tickets after we agree on the plan.