geneontology / noctua

Graph-based modeling environment for biology, including prototype editor and services
http://noctua.geneontology.org/
BSD 3-Clause "New" or "Revised" License
36 stars 13 forks source link

Add a macromolecular complex creator #169

Closed cmungall closed 8 years ago

cmungall commented 9 years ago

We want to allow protein complexes to be the value/filler of enabled_by slots, as well as simple gene products.

Protein complexes can be pre-composed (e.g. in PRO or Intact) or post-composed.

Tickets open about being able to use pre-composed complex: #122 #120 -- this is essentially a matter of adding the relevant stuff to the import chain and/or golr (and possibly relaxing the autocomplete constraint on the enabled_by box).

Not all complexes will be composed in advance, so we need a way to construct them.

Typically complexes will be mereological sums of proteins. We write these here for convenience as P1 + P2 + ... + Pn. Formally this is similar to an OWL class expression 'macromolecular complex and has_member P1 and has_member ... (not equivalent, as the OWL expression is not closed).

In some case you may want to describe individual members of the complex in different states (e.g. phosophorylated). We call this the advanced case, and focus on the simple case here.

There are various options for the UI. First it's important to note that there are different ways in which an enabled_by field can be filled in

  1. On initial creation, using the wizard macro on the left side
  2. After initial creation, by clicking the green knob on an activity box, and filling in entity + enabled_by under the abstrusely titled "Add bundle (class expression & edge pair)"
  3. Via low level editing, creating an individual with the ubernoodle and then connecting two nodes together using an enabled_by edge

It is actually possible to do complexes now using route 3: simply create an instance of 'protein complex', create N instances of gene products P1, ..., Pn, connect them to the parent complex via part_of, then connect the complex to the activity via enabled_by. But this is a bit low level (and additionally folding is not invoked). Also we end up with more individuals, we may want to use the equivalent class expression (@hdietze to comment).

We should have a way of doing this using 1 and/or 2. It should probably be consistent across both (although 2 is inherently a more generic UI component).

One would be to allow the enabled_by slot in the wizard to take multiple values, where this is implicitly a mereological sum. But this may not work well with autocomplete, which expects a single value (there is a secret OWL tunnel here at the moment, any manchester expression can be entered).

Another would be some kind of + symbol below the slot that allows for multiple values, and some logic that treats these as the appropriate intersection of some values froms.

cmungall commented 8 years ago

Current plan:

We will do a first pass of the complex creator where these will be class expressions. E.g.

'protein complex' and has-part some P1 and ... and .. has-part some Pn

We will later switch these to individuals, but this will require some changes to the folding code.

kltm commented 8 years ago

Initially, we'll model some of this as a model-level workbench that allows commands to be sent back.

cmungall commented 8 years ago

The client will create an expression that looks like this. The GO class will be fixed. The members can be any molecular entity.

        {
        'type': 'intersection',
        'expressions': [
            {
            'type': 'class',
            'id': 'GO:0032991'
            },
        {
                    "type": "svf",
            "property": {
                'type': "property",
                'id': "BFO:0000051"
            },
            "filler": {
            "type": "class",
            "id": "UniProtKB:P0000001"
        }
                },
        {
                    "type": "svf",
            "property": {
                'type': "property",
                'id': "BFO:0000051"
            },
            "filler": {
            "type": "class",
            "id": "UniProtKB:P0000002"
        }
                },
        {
                    "type": "svf",
            "property": {
                'type': "property",
                'id': "BFO:0000051"
            },
            "filler": {
            "type": "class",
            "id": "UniProtKB:P0000099"
        }
                }
        ]
        }
kltm commented 8 years ago

The mechanism is already easily done with the class-expression library: https://github.com/berkeleybop/class-expression Really all that's needed here is the form (including add N parts and clearing). Not a particularly tall order (although I'll probably grab a newer framework to do it).

kltm commented 8 years ago

This should now be publicly available. I'm having a little trouble filter for the bioentities over CHEBI, but it might just be my browser, will check later.

Select a model, pull down workbenches, select macromolecular model creator.

cmungall commented 8 years ago

Can we make field 1 have a default value "protein complex". This is what;s used in the majority of cases.

Something not right with field 2. Should act just like the enabled_by field. I can't seem to select anything starting "abcb" right now. I suspect it's not using the noctua-golr solr instance

cmungall commented 8 years ago

Weirdly, there are some Shh genes, but not all.

cmungall commented 8 years ago

OWL:

Individual: <http://model.geneontology.org/5662325600000018/5662325600000019>

    Annotations: 
        <http://geneontology.org/lego/hint/layout/x> "75"^^xsd:string,
        <http://purl.org/dc/elements/1.1/contributor> "http://orcid.org/0000-0002-6601-2165"^^xsd:string,
        <http://purl.org/dc/elements/1.1/date> "2015-12-07"^^xsd:string,
        <http://geneontology.org/lego/hint/layout/y> "75"^^xsd:string

    Types: 
        <http://identifiers.org/uniprot/Q15465>
         and <http://purl.obolibrary.org/obo/GO_0043234>
         and <http://www.informatics.jax.org/accession/MGI:MGI:98297>

Two independent issues:

  1. this looks like an older version of noctua-golr?
  2. The class expression is wrong. It says protein-complex and P1 and P2. Should be 'protein-complex and (has-part some P1) and (has-part some P2)`

We can work with 2 for now. But we need to hook up the complex creator to the correct solr

However, we can work with the above for demo purposes for now.

kltm commented 8 years ago

For your first comment, I will look at getting a default value there (and fixing the spinner).

Your second comment is actually answered by https://github.com/geneontology/noctua/pull/239#event-484568713 , which I'll look at switching now.

I don't understand the "older version of noctua-golr" comment.

The "has-part" should be committed now; will test and roll out in a bit.

By the way, where are you getting that JSON from? If it's code we have control over, it's not technically correct:

{
    "type": "intersection",
    "expressions": [
        {
            "type": "class",
            "id": "GO:0032991"
        },
        {
            "type": "svf",
            "property": {
                "type": "property",
                "id": "BFO:0000051"
            },
            "filler": {
        "type": "class",
        "id": "UniProtKB:P0000001"
            }
        },
        {
            "type": "svf",
            "property": {
                "type": "property",
                "id": "BFO:0000051"
            },
            "filler": {
        "type": "class",
        "id": "UniProtKB:P0000002"
            }
        },
        {
            "type": "svf",
            "property": {
                "type": "property",
                "id": "BFO:0000051"
            },
            "filler": {
        "type": "class",
        "id": "UniProtKB:P0000099"
            }
        }
    ]
}
cmungall commented 8 years ago

"older version of noctua-golr"

This may be a diversion. In the previous neo, they looked like http://www.informatics.jax.org/accession/MGI:MGI:98297

In the current one they look like: http://purl.obolibrary.org/obo/MGI_MGI%98297

However, the conversion to the older form may be from the roundtrip through minerva. @hdietze can confirm

By the way, where are you getting that JSON from

I authored it in emacs and didn't validate. Was intending to make a PR with a test for bbop-class-expression but didn't get that far

cmungall commented 8 years ago

Your second comment is actually answered by #239 (comment

Not seeing the connection but maybe doesn't matter

hdietze commented 8 years ago

Yes the http://www.informatics.jax.org/accession/MGI:MGI:98297 is coming from the curie handler, which has the old(?) mapping. If the request contains only the short-form (i.e. when done from any Golr/AmiGo source), the curie handler does expand MGI: prefix into the jax long-form.

kltm commented 8 years ago

For the JSON: good--I just wanted to make sure some of our code wasn't producing that.

I believe the svfs are properly in there now (they didn't register the first time I looked at your spec).

Production should be filtering correctly now, and the choices do come up...eventually. However, I'm having trouble with some selectize quirks where hits don't always show up at first. I'll look at that along with the default choice and the spinner.

cmungall commented 8 years ago

Nested structure looks fine now, only UI quirks remaining

  1. Autocomplete still won't give the same genes as we get on the main screen. Try typing "abc" in both
  2. Rendering of class expression in box on plumb view should be recursive. ie intersection[3] is a bit opsque
  3. Rendering on green-box-click better, but some odd label not found issue. Noctua should know about has-part? Just shows as BFO IDs here, see screenshot
screen shot 2015-12-08 at 12 08 41 am

Each of these may be minor UI issues that should be in theor own tix

kltm commented 8 years ago

1 is the autocomplete issues mentioned here: https://github.com/geneontology/noctua/issues/169#issuecomment-162699191 . Again, they /do/ show up eventually. Not sure what's up yet.

2 This was done in the past, and looked so bad/took so much space that it was dropped by common agreement.

3 It's weird that's not gotten on the round trip...maybe @hdietze would have some idea?

kltm commented 8 years ago

@cmungall Should BFO:0000051 be RO:0002180?

kltm commented 8 years ago

TODO: need to fix login/token info on workbench (keeps token, but doesn't resolve...).

kltm commented 8 years ago

Deployed now. Probably works. See what happens.

cmungall commented 8 years ago

It works!