SD2E / experimental-intent-parser

A tool that combines a word-processing interface with structured tables and assisted linking to definitions to provide a simple interface for incremental codification of experiment designs.
BSD 3-Clause "New" or "Revised" License
4 stars 0 forks source link

Add more data during OPIL generation (to support direct OPIL input into XPlan) #321

Closed jgladwig closed 3 years ago

jgladwig commented 3 years ago

This issue is a collection of related adjustments I am looking for in how Intent Parser produces its OPIL response to a generateOpilRequest. It is a bit long winded but the end goal here is to allow for XPlan to directly accept the OPIL document as input (replacing the structured requests). Its possible this will break out into multiple issues but I have not yet untangled it that far.

Note that all of the examples below are referencing the same instance of a Strateos GrowthCurve protocol and that it is very possible we need to consider how this data flow will work on other protocols and labs (and also note that I will be abusing the github syntax highlighting for diffs to reference some lines in the examples below).

Structured Requests

To begin, immediately below is a version of the structured request we are getting (though I have hidden the condition_space and conditions fields to avoid reasoning about 50k lines). My focus is going to be on the properties that I know need to pass through XPlan and appear in the final OPIL output. My hope is that @danbryce can include additional information to describe what further information we need to successfully execute XPlan

{
    "challenge_problem": "YEAST_STATES",
    "experiment_id": "experiment.transcriptic.2020-12-03-YeastSTATES-Dual-Response-CRISPR-Growth-Curves-30C",
    "experiment_reference": "YeastSTATES-Dual-Response-CRISPR-Growth-Curves-30C",
    "experiment_reference_url": "https://docs.google.com/document/d/1VC6Mj_UirpZ_0R-zbrDvJjzOjmlRpirUtFdAt89qqb4",
    "protocol": "GrowthCurve",
    "condition_space": {
        ...
    },
    "batches": [
        {
            "id": "0",
            "samples": 96,
            "layout": "fixed"
        },
        {
            "id": "1",
            "samples": 96,
            "layout": "fixed"
        }
    ],
    "defaults": {
        "constants": {
            "lab": "Strateos",
            "container_search_string": [
                "ct1f3w93hd4k927",
                "ct1f3w93hj3z5qk"
            ],
            "strain_property": "Name"
        },
        "parameters": {
            "inoc_info.inoc_vol": "5:microliter",
            "inoc_info.inoc_media_vol": "500:microliter",
            "inoc_info.inc_time_1": "16:hour",
            "inoc_info.inoculation_media": "sc_media",
            "recovery_info.recovery_sample_vol": "2:microliter",
            "recovery_info.recovery_media_vol": "700:microliter",
            "read_info.sampling_info.read_cult_vol": "10:microliter",
            "plate_reader_info.fluor_ex": "488:nanometer",
            "plate_reader_info.fluor_em": "530:nanometer",
            "plate_reader_info.list_of_gains": {
                "gain_1": 0.1,
                "gain_2": 0.16,
                "gain_3": 0.2
            },
            "read_info.sampling_info.read_solv_vol": "90:microliter"
        },
        "conditions": [
            ...
        ],
        "submit": true,
        "protocol_id": "pr1e955a2zbtw65",
        "test_mode": true
    }
}

What XPlan currently receives

Below is the dotname data we receive from Intent Parser via generateOpilRequest. So this is the information in the OPIL document that arrives with a dotname annotation present. I read through the document object by object and anything that is marked with a dotname is converted into the JSON equivalent form.

In short this is the beginnings of OPIL to Strateos JSON converter.

Here is what we current receive. Note that the gains appear to be unset.

{
    "parameters": {
        "inoc_info": {
            "inoc_media_vol": "500:microliter",
            "inoc_vol": "5:microliter"
        },
        "plate_reader_info": {
            "list_of_gains": {
                "gain_1": 0.0,
                "gain_2": 0.0,
                "gain_3": 0.0
            }
        },
        "read_info": {
            "sampling_info": {
                "read_cult_vol": "10:microliter",
                "read_solv_vol": "90:microliter"
            }
        },
        "recovery_info": {
            "recovery_media_vol": "700:microliter",
            "recovery_sample_vol": "2:microliter"
        }
    }
}

What XPlan has been outputting

Below is what XPlan has been outputting (prior to any use of OPIL). The lines marked with a + are data that XPlan is adding. (Note that I have also hidden the src_samples field to keep the line count down.)

@danbryce may want to double check but from what I am seeing all of the data below that is not marked with a + is data that has passed through XPlan and is actually sourced from the original structured request (as seen above).

{
    "parameters": {
        "experimental_info": {
            "experiment_id": "experiment.transcriptic.2020-12-03-YeastSTATES-Dual-Response-CRISPR-Growth-Curves-30C",
            "experiment_reference": "YeastSTATES-Dual-Response-CRISPR-Growth-Curves-30C",
            "experiment_reference_url": "https://docs.google.com/document/d/1VC6Mj_UirpZ_0R-zbrDvJjzOjmlRpirUtFdAt89qqb4"
        },
+        "incubation_info": {
+            "inc_temp": "warm_30"
+        },
        "inoc_info": {
            "inc_time_1": "16:hour",
            "inoc_media_vol": "500:microliter",
            "inoc_vol": "5:microliter",
            "inoculation_media": "sc_media"
        },
        "plate_reader_info": {
            "fluor_em": "530:nanometer",
            "fluor_ex": "488:nanometer",
            "list_of_gains": {
                "gain_1": 0.1,
                "gain_2": 0.16,
                "gain_3": 0.2
            }
        },
        "read_info": {
+            "growth_time": {
+                "sample_points": "1,3,6,9,12,15,18,21,24"
+            },
            "sampling_info": {
                "read_cult_vol": "10:microliter",
                "read_solv_vol": "90:microliter"
            }
        },
        "recovery_info": {
+            "recovery_media": "sc_media_200nm_be",
            "recovery_media_vol": "700:microliter",
            "recovery_sample_vol": "2:microliter"
        },
+        "src_info": {
+            "src_samples": [
+              ...
+            ]
+        }
    }
}

Missing data

The fields that are missing entirely (in dotname form):

The field that are present with wrong data:

How do we resolve this?

More context: After some mild digging it appears that all of the above missing parameters (not the data itself) are described within the Strateos GrowthCurve protocol. This means that the ProtocolInterface that is generated from the Strateos to OPIL converter contains a Parameter for each of the above missing data fields. Note that these Parameters are not currently available in the OPIL pulled from Intent Parser. In other words not only are the ParameterValues not present but the Parameters are also not present.

There are probably multiple ways forward from here but I am hoping to get more eyes on this to reason about how we tie this all together.

So some of the potential paths forward are:

This will remove any guesswork around what portions of the ProtocolInterface are present.

Note: I am not certain this is a simple as just adding in the above fields. There may be some different needs based on different labs and protocols.

Beyond that I think there will be some additional needs that @danbryce can identify for requirements to fully convert to using OPIL as direct input to XPlan instead of structured requests.

jakebeal commented 3 years ago

This is becoming a blocker for @jgladwig now

jakebeal commented 3 years ago

Note that this might split into multiple issues; @jgladwig it would be good if you can note which bits are the blockers and which can be split off.

jgladwig commented 3 years ago

I think I can break this into two parts. They are both blockers related to the opil IP generates and are both required for the end goal of using opil from start to finish. But I think the first part can at least get us opil output (prior to supporting opil input for xplan).

Part 1

This is a blocker for getting output from a opil to strateos json converter. Note that doing this part without also doing part 2 would still leave us dependent on structured requests for input but it will at least allow me to flesh out the output side. In particular it will clear the path for me to write the opil to strateos json converter and to build unit tests to ensure that the new output matches our previous output.

The heart of the issue here is that I would like to shift the content of the opil returned by Intent Parser to be as close as possible to its completed form.

Right now that appears to mean:

Part 2

The second is a blocker for having xplan accept opil as input.

This is an extension of part 1. In particular the goal will be to reduce the number of blanks xplan fills in to the minimal required set (so only the values xplan makes any decisions on). Anything that is known in advance would be already filled in in the opil returned by generateOpilRequest. See the original issue contents for the list of missing data (at least for GrowthCurve).

If we resolve this as well then I expect I can begin to adjust the input of xplan to process the opil from Intent Parser directly (remove the need for structured requests).

jgladwig commented 3 years ago

Note that there are some additions @danbryce may have in regards to the condition space. I think those will primarily deal with Part 2 and the shift to accepting opil as input to xplan.

jgladwig commented 3 years ago

Here are the full opil files for the above: opil_from_protocol_interface.txt opil_from_intent_parser.txt

danbryce commented 3 years ago

Here are some examples of what I expect for the condition space inputs. Look at the key condition_space. It has several factors. Each factor is a column in the ER. A factor has a name, domain, domain type (dtype), object type (otype), factor type (ftype), and optional ‘lab_name’, ‘lab_prefix’, and ‘lab_suffix’. The optional items are for mapping the factor to a Strateos parameter.

Riboswitches:

https://jupyter.sd2e.org/user/{user}/edit/sd2e-projects/sd2e-project-14/xplan-reactor/RIBOSWITCHES/experiments/experiment.transcriptic.2020-10-16-Cell-Free-Transcriptional-Riboswitch-Characterization-Sequences-1-32/invocation_experiment.transcriptic.2020-10-16-Cell-Free-Transcriptional-Riboswitch-Characterization-Sequences-1-32.json

Growth Curve:

https://jupyter.sd2e.org/user/{user}/edit/sd2e-projects/sd2e-project-14/xplan-reactor/YEAST_STATES/experiments/experiment.transcriptic.2020-12-11-YeastSTATES-Activator-Circuit-Dox-Growth-Curves-30C/invocation_experiment.transcriptic.2020-12-11-YeastSTATES-Activator-Circuit-Dox-Growth-Curves-30C.json

Time Series:

https://jupyter.sd2e.org/user/{user}/edit/sd2e-projects/sd2e-project-14/xplan-reactor/YEAST_STATES/experiments/experiment.transcriptic.2021-01-15-YeastSTATES-1-0-Time-Series-Round-4-0/invocation_experiment.transcriptic.2021-01-15-YeastSTATES-1-0-Time-Series-Round-4-0.json

Obstacle Course:

https://jupyter.sd2e.org/user/{user}/edit/sd2e-projects/sd2e-project-14/xplan-reactor/YEAST_STATES/experiments/experiment.transcriptic.2020-02-03-CEN-PK-Inducible-CRISPR-4-Day-Obstacle-Course/invocation_experiment.transcriptic.2020-02-03-CEN-PK-Inducible-CRISPR-4-Day-Obstacle-Course.json

On Feb 24, 2021, at 2:36 PM, jgladwig notifications@github.com wrote:

Here are the full opil files for the above: opil_from_protocol_interface.txt opil_from_intent_parser.txt

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

tramyn commented 3 years ago

@jgladwig , @jakebeal I have broken this issue down to the following sub issues:

I will need to distinguish what extra parameters looks like on @jgladwig end. Intent Parser generates 12 additional parameter fields that are used for running an experiment. If these fields matches the fields that @jgladwig are identifying, then they are valid parameters that xplan should reason about. Otherwise, I will need the name of these parameters and follow up with a bug report.

I will need clarification on what URIs seems wrong in the above example.

jgladwig commented 3 years ago

In regards to the potential wrongness in the URIs:

To point to a specific instance of the potential error... I see these two values in the ExperimentalRequest I pasted in my previous comment (under hasParameterValue):

...
<http://strateos.com/GrowthCurve/StringParameter9/StringValue10>,
<https://sd2e.org/ip3a4b5a12e48847e6ad834888245f4f74/MeasureValue10>,
...

To me the value https://sd2e.org/ip3a4b5a12e48847e6ad834888245f4f74/MeasureValue10 reads as correct because it is a value that is owned by the https://sd2e.org/ip3a4b5a12e48847e6ad834888245f4f74 ExperimentalRequest.

In short, the base URI matches so I expect the value to be contained within that specific ExperimentalRequest and have no unexpected links out to other objects.

Conversely the value http://strateos.com/GrowthCurve/StringParameter9/StringValue10 reads as incorrect (to me) because it reads as though the value is owned by the http://strateos.com/GrowthCurve ProtocolInterface when I am expecting it to be owned by the https://sd2e.org/ip3a4b5a12e48847e6ad834888245f4f74 ExperimentalRequest.

My worry here is if we ever are dealing with documents with multiple ExperimentalRequests defined within (like we are talking about in this issue https://github.com/SD2E/opil/issues/142). The parameter value URIs that reference the http://strateos.com/GrowthCurve Protocol interface are not unique to the ExperimentalRequest that contains them. So if someone were to decide to change the value of one of these URIs (that point back to the ProtocolInterface) in only one experimental request then that change would propagate to all other experimental requests that also reference that object within the document.

In short, the URIs in the example I am calling 'potentially wrong' suggest that there is some extra object linkage and that not all parameter values in the ExperimentalRequest are owned by that ExperimentalRequest.

Perhaps this is intentional behavior. It was just something I noticed while reading through the files and it brought questions to mind as to what those URIs were implying.

Hopefully the above helps explain my question/concern here. And note that I ultimately I defer to @jakebeal for this as I am only listing my expectations of what the sbol doc will look like for an experimental request. My expectations may be wrong.

jakebeal commented 3 years ago

Good eyes, @jgladwig : http://strateos.com/GrowthCurve/StringParameter9/StringValue10 is indeed a potential problem.

What it looks like to me is that the default parameters are all getting referenced from the ExecutionRequest rather than copied into the ExecutionRequest. They should be copied over, making them child objects of the ExecutionRequest.

Because these are write-only values, this would likely not actually cause any errors. It's not the intended usage though, per the specification, and would be good to adjust to match the specification.

jakebeal commented 3 years ago

Closing this in favor of the broken out sub-issues.