Open tramyn opened 3 years ago
As far as I can tell from reading the SBOL spec, TopLevels and their children objects are required to have the same namespace.
Can you please describe more about your other issue:
I'm left with parameter values with the valueOf field not referenceable to its corresponding Parameter object.
I'm not tracking how this follows from the issue about mixing namespaces.
Thanks Bryan
@bbartley Here is a screen capture of an opil parameter created before writing into xml.
Notice the ParameterValue.value_of is correctly pointing to the Parameter object. When I write this information into an xml file (a copy of the xml file was sent on slack), Parameter identity changes from
https://sd2e.org/ipcd872dc9adfe4e3cb542f0e19d8afe69
to http://aquarium.bio/htc/StringParameter8
. Now if you check the ParameterValue.value_of in the file, it is still set to https://sd2e.org/ipcd872dc9adfe4e3cb542f0e19d8afe69
. I know the Parameter object exists because the information is there in the file but the identity to connect this parameter to value has changed.
Sounds like a bug!
When I write this information into an xml file (a copy of the xml file was sent on slack), Parameter identity changes from https://sd2e.org/ipcd872dc9adfe4e3cb542f0e19d8afe69 to http://aquarium.bio/htc/StringParameter8.
Can you confirm that the identity of the Parameter
changes in the context of the Document
as well as in the serialized file? Can you isolate the exact moment when the identity changes. For example, are you certain it changes during the call to write
?
Hi @tramyn I am unable to reproduce the error you describe. Here was my attempt. Do you see some way to tweak this that will reproduce the error?.
import opil
import sbol3
sbol3.set_namespace('http://foo.org')
doc = sbol3.Document()
p = opil.ProtocolInterface('p')
param = opil.StringParameter()
p.has_parameter = [param]
doc.add(p)
# Change namespace
sbol3.set_namespace('http://bar.org')
er = opil.ExperimentalRequest('er')
doc.add(er)
v = opil.StringValue()
v.value_of = p
er.has_parameter_value = [v]
print(doc.write_string(file_format='ttl'))
@bbartley here is what I am doing to reproduce this bug:
import opil
import sbol3
file_path = 'jellyfish_htc.xml'
opil_doc = opil.Document()
opil_doc.read(file_path, sbol3.RDF_XML)
# copy protocol_interface to avoid overwriting existing data
protocol_interfaces = [top_level.copy() for top_level in opil_doc.objects if isinstance(top_level, opil.ProtocolInterface)]
if len(protocol_interfaces) > 1:
raise Exception('Expecting 1 ProtocolInterface but %d were found' % len(protocol_interfaces))
# create new parameter to add to existing ProtocolInterface
protocol_interface = protocol_interfaces[0]
current_parameters = protocol_interface.has_parameter
new_parameters = [parameter for parameter in current_parameters]
string_parameter = opil.StringParameter('https://sd2e.org/ipd8aea1775c4041a1aa41c562319fca68')
string_parameter_value = opil.StringValue('https://sd2e.org/ip2c22022d655d4c8fadf7b605b684f338/StringValue1')
string_parameter_value.value = 'foo'
string_parameter_value.value_of = string_parameter
new_parameters.append(string_parameter)
# bug appears here when new_parameters are assigned to has_parameter
protocol_interface.has_parameter = new_parameters
for parameter in new_parameters:
print(parameter.identity)
@tramyn I think you want to rearrange your code a little bit to make this work. You should add the string_parameter
to protocol_interface.has_parameter
before setting it to string_parameter_value.value_of
so that it has the proper identity.
There's no point setting identities on the StringParameter
and StringValue
because they get overwritten. That is demonstrated by the issue you are having.
So something like this (completely untested):
string_parameter = opil.StringParameter()
# This sets string_parameter's identity
protocol_interface.has_parameter.append(string_parameter)
string_parameter_value = opil.StringValue()
string_parameter_value.value = 'foo'
# It is now save to add a reference to string_parameter because the identity has been set
string_parameter_value.value_of = string_parameter
@tcmitchell Here is a little bit of background why adding parameters with different namespace made sense to me. Let's say there are 3 tools that need to communicate information about a opil.ProtocolInterface (ex: Aquarium -> Intent Parser -> Xplan -> Aquarium). Aquarium creates a opil.ProtocolInterface with initial parameters and default values. This information is then passed to Intent Parser to allow users who want to run this protocol to select/fill in their values. On the other end, Xplan requires that additional Parameters needs to be added to this ProtocolInterface in order to run an experiment. XPlan want users of Intent Parser to access and configure run information. Since these additional parameters wasn't originally created by Aquarium, it make sense that new parameters added to the ProtocolInterface should not take on the Aquarium namespace. Otherwise, this will imply that Aquarium wanted this level of information specified within it's ProtocolInterface. This is when Intent Parser's namespace is involved to include these additional parameters to an existing Aquarium ProtocolInterface.
If opil requires that any Parameter added to a ProtocolInterface must take on the same namespace, then this information must be made clear in the opil specification. We did discuss in an opil call that these additional Parameters will be created as xplan's opil custom annotation so I will update Intent Parser base off of your suggestion.
The requirement for naming of dependent objects comes from the SBOL3 specification, which underlies OPIL. Section 7.2 states:
SBOL-compliant URIs also facilitate auto-construction of child objects with unique URIs. Child objects of TopLevel objects with compliant URIs MUST conform to the following pattern: "
/ " where the refers to the URI of the parent object, the refers to the SBOL class of the child object, and is a unique index for the child object. The of a new object SHOULD be calculated at time of object creation as 1 + the maximum for each object in the parent (e.g., " /SequenceAnnotation37”). Note that numbering is independent for each type, so a Component can have children “SubComponent37” and “Constraint37”.
pySBOL3 enforces this rule when the parent objects have SBOL-compliant URIs, which is why the identity you are passing into the constructor is overwritten when the object is added to a parent object. So it's not the fault of OPIL, it goes all the way back to the SBOL3 specification.
@tcmitchell @bbartley The solution that I want to get here is to see how the opil library addresses this kind of problem and how will the library let a user know of this problem. Should opil silently allow users to create Parameter objects with different namespaces or should it throw an exception? Right now, opil silently allow users to specify multiple namespaces but then update namespaces for Parameters to conform to requirements mentioned in the spec. The valueOf
for ParameterValue
references a Parameter object was not updated to reflect this change to Parameter.identity
. Since the specification already addresses this kind of issue, then the library should check these conditions and inform its user in one of two ways:
Parameter
object, then check for ParameterValue
objects that points to this Parameter
and update its references as well. This will ensure all references pointing to this object does not break. Otherwise, users of the library are limited to coding in a particular way to get the information specified correctly.I think it is getting messy in terms of making users track information from two data language specification. It isn't easy separating which information applies to which specification so can the two libraries and its specification make it clearer for users like me to understand?
It looks like pySBOL3 raises a ValueError
if an object is going to have its identity overwritten. So I think we need to dig into this a bit more and understand how the argument to opil.StringValue
and opil.StringParameter
gets handled, and whether pySBOL3 is detecting that the identity is already set. I don't know why the ValueError
would be getting bypassed or would fail to get propagated.
We've had an offline conversation and figured out why the ValueError
is not raised when the objects are reparented (have their identity overwritten). This was a pySBOL3 issue (https://github.com/SynBioDex/pySBOL3/issues/178) that was fixed in pySBOL3 1.0.a7. The IntentParser is using pySBOL3 1.0.a6, so it does not have that change.
At a high level, it looks like @tramyn is copying a ProtocolInterface
and its children, then augmenting the PI by adding new children. I think the legally correct approach is to first copy the ProtocolInterface
into the IP namespace, then mint new child objects in the IP namespace. Unfortunately, I don't think the copy
methods currently support copying into a new namespace. Perhaps we should re-prioritize https://github.com/SynBioDex/pySBOL3/issues/132
I want to add new parameters to an existing ProtocolInterface but when I do so, opil will modify these new parameters to use the ProtocolInterface namespace. Ideally, I want to distinguish where parameters are derived from so the new parameters uses Intent Parser's namespace and the ProtocolInterface points to a lab namespace (ex: http://aquarium.bio/htc/). However, opil will not allow mixtures of namespaces for parameters added to a ProtocolInterface and now I'm left with parameter values with the
valueOf
field not referenceable to its corresponding Parameter object.What is the best way to proceed here?