SD2E / opil

The Open Protocol Interface Language (OPIL) is intended as a standard language for protocol interfaces
5 stars 1 forks source link

How to add opil.Parameters with different namespace to a ProtocolInterface? #179

Open tramyn opened 3 years ago

tramyn commented 3 years ago

I want to add new parameters to an existing ProtocolInterface but when I do so, opil will modify these new parameters to use the ProtocolInterface namespace. Ideally, I want to distinguish where parameters are derived from so the new parameters uses Intent Parser's namespace and the ProtocolInterface points to a lab namespace (ex: http://aquarium.bio/htc/). However, opil will not allow mixtures of namespaces for parameters added to a ProtocolInterface and now I'm left with parameter values with the valueOf field not referenceable to its corresponding Parameter object.

What is the best way to proceed here?

bbartley commented 3 years ago

As far as I can tell from reading the SBOL spec, TopLevels and their children objects are required to have the same namespace.

Can you please describe more about your other issue:

I'm left with parameter values with the valueOf field not referenceable to its corresponding Parameter object.

I'm not tracking how this follows from the issue about mixing namespaces.

Thanks Bryan

tramyn commented 3 years ago

@bbartley Here is a screen capture of an opil parameter created before writing into xml. opil_issue179 Notice the ParameterValue.value_of is correctly pointing to the Parameter object. When I write this information into an xml file (a copy of the xml file was sent on slack), Parameter identity changes from https://sd2e.org/ipcd872dc9adfe4e3cb542f0e19d8afe69 to http://aquarium.bio/htc/StringParameter8. Now if you check the ParameterValue.value_of in the file, it is still set to https://sd2e.org/ipcd872dc9adfe4e3cb542f0e19d8afe69. I know the Parameter object exists because the information is there in the file but the identity to connect this parameter to value has changed.

bbartley commented 3 years ago

Sounds like a bug!

When I write this information into an xml file (a copy of the xml file was sent on slack), Parameter identity changes from https://sd2e.org/ipcd872dc9adfe4e3cb542f0e19d8afe69 to http://aquarium.bio/htc/StringParameter8.

Can you confirm that the identity of the Parameter changes in the context of the Document as well as in the serialized file? Can you isolate the exact moment when the identity changes. For example, are you certain it changes during the call to write?

bbartley commented 3 years ago

Hi @tramyn I am unable to reproduce the error you describe. Here was my attempt. Do you see some way to tweak this that will reproduce the error?.

import opil
import sbol3

sbol3.set_namespace('http://foo.org')
doc = sbol3.Document()
p = opil.ProtocolInterface('p')
param = opil.StringParameter()
p.has_parameter = [param]
doc.add(p)

# Change namespace
sbol3.set_namespace('http://bar.org')
er = opil.ExperimentalRequest('er')
doc.add(er)
v = opil.StringValue()
v.value_of = p
er.has_parameter_value = [v]
print(doc.write_string(file_format='ttl'))
tramyn commented 3 years ago

@bbartley here is what I am doing to reproduce this bug:

import opil
import sbol3

file_path = 'jellyfish_htc.xml'
opil_doc = opil.Document()
opil_doc.read(file_path, sbol3.RDF_XML)

# copy protocol_interface to avoid overwriting existing data
protocol_interfaces = [top_level.copy() for top_level in opil_doc.objects if isinstance(top_level, opil.ProtocolInterface)]
if len(protocol_interfaces) > 1:
   raise Exception('Expecting 1 ProtocolInterface but %d were found' % len(protocol_interfaces))

# create  new parameter to add to existing ProtocolInterface
protocol_interface = protocol_interfaces[0]
current_parameters = protocol_interface.has_parameter
new_parameters = [parameter for parameter in current_parameters]
string_parameter = opil.StringParameter('https://sd2e.org/ipd8aea1775c4041a1aa41c562319fca68')
string_parameter_value = opil.StringValue('https://sd2e.org/ip2c22022d655d4c8fadf7b605b684f338/StringValue1')
string_parameter_value.value = 'foo'
string_parameter_value.value_of = string_parameter
new_parameters.append(string_parameter)

# bug appears here when new_parameters are assigned to has_parameter
protocol_interface.has_parameter = new_parameters
for parameter in new_parameters:
   print(parameter.identity)
tcmitchell commented 3 years ago

@tramyn I think you want to rearrange your code a little bit to make this work. You should add the string_parameter to protocol_interface.has_parameter before setting it to string_parameter_value.value_of so that it has the proper identity.

There's no point setting identities on the StringParameter and StringValue because they get overwritten. That is demonstrated by the issue you are having.

So something like this (completely untested):

string_parameter = opil.StringParameter()

# This sets string_parameter's identity
protocol_interface.has_parameter.append(string_parameter)

string_parameter_value = opil.StringValue()
string_parameter_value.value = 'foo'

# It is now save to add a reference to string_parameter because the identity has been set
string_parameter_value.value_of = string_parameter
tramyn commented 3 years ago

@tcmitchell Here is a little bit of background why adding parameters with different namespace made sense to me. Let's say there are 3 tools that need to communicate information about a opil.ProtocolInterface (ex: Aquarium -> Intent Parser -> Xplan -> Aquarium). Aquarium creates a opil.ProtocolInterface with initial parameters and default values. This information is then passed to Intent Parser to allow users who want to run this protocol to select/fill in their values. On the other end, Xplan requires that additional Parameters needs to be added to this ProtocolInterface in order to run an experiment. XPlan want users of Intent Parser to access and configure run information. Since these additional parameters wasn't originally created by Aquarium, it make sense that new parameters added to the ProtocolInterface should not take on the Aquarium namespace. Otherwise, this will imply that Aquarium wanted this level of information specified within it's ProtocolInterface. This is when Intent Parser's namespace is involved to include these additional parameters to an existing Aquarium ProtocolInterface.

If opil requires that any Parameter added to a ProtocolInterface must take on the same namespace, then this information must be made clear in the opil specification. We did discuss in an opil call that these additional Parameters will be created as xplan's opil custom annotation so I will update Intent Parser base off of your suggestion.

tcmitchell commented 3 years ago

The requirement for naming of dependent objects comes from the SBOL3 specification, which underlies OPIL. Section 7.2 states:

SBOL-compliant URIs also facilitate auto-construction of child objects with unique URIs. Child objects of TopLevel objects with compliant URIs MUST conform to the following pattern: "/" where the refers to the URI of the parent object, the refers to the SBOL class of the child object, and is a unique index for the child object. The of a new object SHOULD be calculated at time of object creation as 1 + the maximum for each object in the parent (e.g., "/SequenceAnnotation37”). Note that numbering is independent for each type, so a Component can have children “SubComponent37” and “Constraint37”.

pySBOL3 enforces this rule when the parent objects have SBOL-compliant URIs, which is why the identity you are passing into the constructor is overwritten when the object is added to a parent object. So it's not the fault of OPIL, it goes all the way back to the SBOL3 specification.

tramyn commented 3 years ago

@tcmitchell @bbartley The solution that I want to get here is to see how the opil library addresses this kind of problem and how will the library let a user know of this problem. Should opil silently allow users to create Parameter objects with different namespaces or should it throw an exception? Right now, opil silently allow users to specify multiple namespaces but then update namespaces for Parameters to conform to requirements mentioned in the spec. The valueOf for ParameterValue references a Parameter object was not updated to reflect this change to Parameter.identity. Since the specification already addresses this kind of issue, then the library should check these conditions and inform its user in one of two ways:

  1. If opil overwrites the identity for a Parameter object, then check for ParameterValue objects that points to this Parameter and update its references as well. This will ensure all references pointing to this object does not break. Otherwise, users of the library are limited to coding in a particular way to get the information specified correctly.
  2. Flag this condition and throw and exception that they are adding a parameter object with a different namespace.

I think it is getting messy in terms of making users track information from two data language specification. It isn't easy separating which information applies to which specification so can the two libraries and its specification make it clearer for users like me to understand?

tcmitchell commented 3 years ago

It looks like pySBOL3 raises a ValueError if an object is going to have its identity overwritten. So I think we need to dig into this a bit more and understand how the argument to opil.StringValue and opil.StringParameter gets handled, and whether pySBOL3 is detecting that the identity is already set. I don't know why the ValueError would be getting bypassed or would fail to get propagated.

tcmitchell commented 3 years ago

We've had an offline conversation and figured out why the ValueError is not raised when the objects are reparented (have their identity overwritten). This was a pySBOL3 issue (https://github.com/SynBioDex/pySBOL3/issues/178) that was fixed in pySBOL3 1.0.a7. The IntentParser is using pySBOL3 1.0.a6, so it does not have that change.

bbartley commented 3 years ago

At a high level, it looks like @tramyn is copying a ProtocolInterface and its children, then augmenting the PI by adding new children. I think the legally correct approach is to first copy the ProtocolInterface into the IP namespace, then mint new child objects in the IP namespace. Unfortunately, I don't think the copy methods currently support copying into a new namespace. Perhaps we should re-prioritize https://github.com/SynBioDex/pySBOL3/issues/132