ga4gh / fasp-scripts

Apache License 2.0
11 stars 7 forks source link

Update schema for table used by mapping1-manual notebook #21

Closed ianfore closed 3 years ago

ianfore commented 3 years ago

In order to illustrate the effective use of Search schema, please update schema on the public DNAStack server for search_cloud.cshcodeathon.organoid_profiling_pc_subject_phenotypes_gru to one derived from the dbGap XML data dictionary.

See fast/data/dbgap for the data_dict.xml to use.

The notebook that uses that table is https://github.com/ga4gh/fasp-scripts/blob/master/notebooks/search/mapping1-manual.ipynb

jfuerth commented 3 years ago

I will work on this tomorrow.

jfuerth commented 3 years ago

Sorry, I was swamped today. Bumping to first thing tomorrow.

jfuerth commented 3 years ago

Here is the data dictionary in JSON Schema format. Next I will try to insert it into the Search implementation so it is returned in API responses for the correct table.

{
  "$id": "phs001611.v1.pht009160.v1.Organoid_Profiling_PC_Subject_Phenotypes",
  "$schema": "http://json-schema.org/draft-07/schema",
  "description": null,
  "properties": {
    "age": {
      "$comment": "UNIT 'Years'",
      "description": "Subject's age",
      "maximum": 92.0,
      "minimum": 24.0,
      "oneOf": [
        {
          "const": "N/A",
          "title": "Not vailable"
        }
      ],
      "type": "integer, encoded value"
    },
    "race": {
      "description": "Race of participant",
      "oneOf": [
        {
          "const": "AA",
          "title": "African American"
        },
        {
          "const": "A",
          "title": "Asian"
        },
        {
          "const": "W",
          "title": "White, Caucasian"
        },
        {
          "const": "H",
          "title": "Hispanic"
        },
        {
          "const": "N/A",
          "title": "Not vailable"
        }
      ],
      "type": "string"
    },
    "sex": {
      "description": "Sex of participant",
      "oneOf": [
        {
          "const": "F",
          "title": "Female"
        },
        {
          "const": "N/A",
          "title": "Not Applicable"
        },
        {
          "const": "M",
          "title": "Male"
        }
      ],
      "type": "string"
    },
    "subject_id": {
      "description": "De-identified Subject ID",
      "type": "string"
    }
  },
  "type": "object"
}
ianfore commented 3 years ago

Thanks. Look forward to seeing it in the Search implementation.

In this case there's a typo that occurs twice, "Not vailable" but that tracks back to the data_dict.xml. I don't think we should try and fix that in the transform to XML Schema though. The first intent is to show the description as provided by the investigator and curated under the current mechanisms. That's the intent represented in the notebook referred to above.

There's a need to address variations in type with data_dicts. I have some code which addresses that. Those typos could perhaps be handled by the same route. Started a separate issue on that #22

jfuerth commented 3 years ago

The above schema (mechanically derived from the dbGaP XML data dictionary) is now returned from https://ga4gh-search-adapter-presto-public.prod.dnastack.com/table/search_cloud.cshcodeathon.organoid_profiling_pc_subject_phenotypes_gru/info. Let me know if this is what you were hoping for!

ianfore commented 3 years ago

That works. Thanks. Updated the notebook to remove the workaround. Have also added capability to the python client to generate a template in which a mapping could be created.