Knowledge-Graph-Hub / kg-microbe

https://knowledge-graph-hub.github.io/kg-microbe/index.html
BSD 3-Clause "New" or "Revised" License
16 stars 3 forks source link

Define schemas for sources and include semantic enums in the schema #14

Open cmungall opened 3 years ago

cmungall commented 3 years ago

@realmarcin and I did this for the NCBI trait table:

https://github.com/Knowledge-Graph-Hub/kg-microbe/pull/13

this makes a dumb flat data-dictionary type schema, with one class and a bunch of slots

    slots:
    - tax_id
    - species_tax_id
    - data_source
    - org_name
    - species
    - genus
    - family
    - order
    - class
    - phylum
    - superkingdom
    - gram_stain
    - metabolism
    - pathways
    - carbon_substrates
    - sporulation
    - motility
    - range_tmp
    - range_salinity
    - cell_shape
    - isolation_source
    - d1_lo
    - d1_up
    - d2_lo
    - d2_up
    - doubling_h
    - genome_size
    - gc_content
    - coding_genes
    - optimum_tmp
    - optimum_ph
    - growth_tmp
    - rRNA16S_genes
    - tRNA_genes
    - ref_id

these slots have definitions. They are minimal just now as I auto-inferred!

slots:
  tax_id:
    range: integer
    examples:
      value: '542'
  species_tax_id:
    range: integer
    examples:
      value: '542'
  data_source:
    range: data_source_enum
    examples:
      value: silva
  org_name:
    range: string
    examples:
      value: Zymomonas mobilis
  species:
    range: string
    examples:
      value: Zymomonas mobilis
  genus:
    range: string
    examples:
      value: Zymomonas

We have some enums that we will want. to map:

  metabolism_enum:
    permissible_values:
      anaerobic:
        description: anaerobic
      strictly anaerobic:
        description: strictly anaerobic
      obligate aerobic:
        description: obligate aerobic
      aerobic:
        description: aerobic
      facultative:
        description: facultative
      microaerophilic:
        description: microaerophilic
      obligate anaerobic:
        description: obligate anaerobic
      NA:
        description: NA

Any mapping ticket should be closed with a PR on this schema