INCATools / dead_simple_owl_design_patterns

A simple system for specifying OWL class design patterns for OBO-ish ontologies.
http://incatools.github.io/dead_simple_owl_design_patterns/
GNU General Public License v3.0
42 stars 5 forks source link

Support 0-many var input #71

Closed dosumis closed 3 years ago

dosumis commented 3 years ago

Related ticket: https://github.com/INCATools/dead_simple_owl_design_patterns/issues/16

We need to be able to support templating in cases where a variable is a list (0-many) => 0-many clauses added to text or manchester syntax output using some transparent templating system.

One option might be to use Moustache for templating. Another, perhaps not well thought our suggestion:

text: "A %s that {%s,’ and ‘} and something else %s”
vars:
neuron_type

list_vars:
    marker_list

def:
    clauses:
text: "A %s %s %s that "
  vars:
taxon
brain_region
neuron_type
  cardinality: 1
text: "expresses %s" 
  vars:
marker_list
  cardinality: ">=0"
  sep: " and "
# => "A GABAergic that expresses gene 1 and expresses gene 2." 
equivalent_to:
  and_clauses:
text: "%s"
vars: 
neuron_type
cardinality: 1
text: "'expresses' some %s"
vars:
marker_list
cardinality: ">=1"
  annotation: 

# equivalent_to: 'GABAergic neuron' and (expresses some gene1) and (expresses some gene2)

see discussion in https://docs.google.com/document/d/1n6gJdypX7l-JQypPX-4Vl4Yf_ItcV47XdZM1Uqk9OlM/edit#heading=h.czjzooey00yv

Based on current discussion we favour Moustache - but needs further investigation.

dosumis commented 3 years ago

@hkir-dev - I've decided on a relatively simple (deliberately dumb) solution.

  1. Allow intermediate variables, supporting:
    • static string assignment
    • printf_sub*
    • regex sub on strings* - (maybe slightly poor implementation right now under substitution and regex_sub keys)
    • join function on lists of strings* allowing specification of separator.

* owl entities to be interpreted as rdfs:label* of owl entity or other readable identifier (specified at pattern level using existing system)

  1. Add generic support for multi-clause printf:
definitions:
  multi_clause_printf:
    required: ['sep', 'clauses']
    additionalProperties: False
    properties:
      sep:
        type: string
      clauses:
        type: array
        items: { $ref: '#definitions/clause'}

  printf_clause:
    required: [ 'text', 'vars' ]
    additionalProperties: False
    properties:
      text:
        type: string
      vars:
        type: array
        items: string
      sub_clauses:
        type: {$ref: '#definitions/multi_clause_printf'}

The tricky bit here is integrating into the existing printf objects. We need to be able to allow multi_clause_printf as an option for these in place of 'text' & 'vars' while maintaining backwards compatibility. I think this should be possible using compositional options for JSON_schema -although these are a bit limited and clunky. This is the main problem left to solve (I think).

  1. Downstream interpretation:
    • Any time any single var is empty in a printf, the whole clause fails. We can deal with situations where only some slots are empty with warnings or exceptions.
    • Any time any var value is a list, this is interpreted as "add n clauses joined by sep" where n is the length of a the list. Only one list var is allowed per set of vars.
dosumis commented 3 years ago

Made a start on this - see linked PR

dosumis commented 3 years ago

Wondering what best implementation would be for 1 given. We are in danger of inventing a declarative programming language. Some thoughts:

sketch

definitions:
      function:
         oneOf: ['regex, 'join']
       join:
           properties:
                sep: { type: string }
     internal_vars:
         properties:
            var_name
            apply :  { $ref: "#definitions/function"}

internal_vars:
    - var_name: fu
       apply: 
              join: 
                    sep: ', '
       input: bar

     - var_name: bin
hkir-dev commented 3 years ago

Schema updated. Compatible pattern examples (internal vars & join, cardinality>1, optional clauses) for review are as follows:

# '%s and ''has modifier'' some ''acute'''
equivalentTo:
  multi_clause:
    sep: " and "
    clauses:
      - text: '%s'
        vars:
          - disease
       # card>1, tsv value with '|'
      - text: '''has modifier'' some %s'
        vars:
          - component

# internal vars and join function example
internal_vars:
  - var_name: multi_val_int
    # tsv value with '|'
    input: "'multi_val'"
    apply:
      join:
        sep: ' and '

# internal var usage example
def:
  text: Acute form of %s.
  vars:
    - multi_val_int

# optional clauses
comment:
  multi_clause:
    sep: ". "
    # values can be blank in tsv
    clauses:
      - text: 'First sentence %s'
        vars:
          - optional_clause_field1
      - text: 'Second sentence %s'
        vars:
          - optional_clause_field2

# labels cardinality > 1, (no separator) example
name:
  multi_clause:
    clauses:
      - text: acute %s
        # card>1, tsv value with '|'
        vars:
        - disease_label

# multiple super classes
logical_axioms:
  - axiom_type: subClassOf
    text: "%s"
    # tsv value with '|'
    vars:
      - cell_type
dosumis commented 3 years ago

Looks good.

Can we add some test cases for these. Would then be happy to merge.

can we use data_list_vars for tsv values with '|'

Yes. DOSDP_tools should already be compatible.

Can you open a ticket on DOSDP_tools repo for implementation of these schema extensions?

dosumis commented 3 years ago

Think this is now fixed so closing. IIRC the main work left do is to extend dosdp_tools to support list_vars