SynBioDex / pySBOL3

Native python implementation of SBOL 3.0 specification
MIT License
37 stars 16 forks source link

Issues with accessing CombinatorialDerivation value #350

Closed abamaj closed 2 years ago

abamaj commented 2 years ago

@JMante1 and I have been working on reading through the CombinatorialDerivations of an SBOL3 file, and ran into a slight issue. We tried to access the value as a dictionary, as was the case when working with SBOL2, but received a list of URIs instead. For example:

for c in doc.objects:
     for prop in c.properties:
          print(c.properties)

This is the example result over the first iteration of properties:

['http://sbols.org/v3#displayId', 'http://sbols.org/v3#name', 'http://sbols.org/v3#description', 'http://www.w3.org/ns/prov#wasDerivedFrom', 'http://www.w3.org/ns/prov#wasGeneratedBy', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 'http://sbols.org/v3#hasNamespace', 'http://sbols.org/v3#hasAttachment', 'http://sbols.org/v3#strategy', 'http://sbols.org/v3#template']

We were wondering how we could access the actual value of the CombinatorialDerivation, and were also wondering if other parts of the SBOL3 document had a specific way in which to be accessed.

JMante1 commented 2 years ago

Just a clarification: We would like to iterate over the properties of objects in sbol. In pysbol2 this seems possible by calling the object.properties dictionary. However in pysbol3 the object.properites appears to just be list and be absent from combinatorialderivation objects.

jakebeal commented 2 years ago

You can get this information by accessing _properties, but as you can see from its name that it not a recommended approach when others will work.

If you're doing what I think you are, however, in attempting to serialize an arbitrary SBOL object into Excel, then I suspect that is actually a reasonable approach to take.

tcmitchell commented 2 years ago

My initial suggestion is to access the properties of the CominatorialDerivation through the front door, using the appropriate accessors. This way the library will handle various things for you.

cd = sbol3.CominatorialDerivation(<ARGUMENTS HERE>)
cd.strategy
cd.template
cd.variable_features

cd.properties is a list of properties, as you have found out. It is not a dict as it is in pySBOL2. The .properties accessor on pySBOL3 objects is only a partial list and does not include any SBOL3 properties that contain child objects (depicted with a filled diamond in the SBOL3 specification).

If you choose to use cd._properties, beware that you will be using internal data structures and may have to do type conversions and lookups to find the information that you want or need. That's an example of the kind of thing the library does for you if you use the "front door" property accessors.

However in pysbol3 the object.properites appears to just be list and be absent from combinatorialderivation objects.

.properties should not be absent from CombinatorialDerivation. If that's the case, can you send an example that will help me reproduce it? I see .properties on CombinatorialDerivation.

>>> import sbol3
>>> cd = sbol3.CombinatorialDerivation('http://example.com/cd1', 'http://example.com/c1')
>>> cd.properties
['http://sbols.org/v3#displayId', 'http://sbols.org/v3#name', 'http://sbols.org/v3#description', 'http://www.w3.org/ns/prov#wasDerivedFrom', 'http://www.w3.org/ns/prov#wasGeneratedBy', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 'http://sbols.org/v3#hasNamespace', 'http://sbols.org/v3#hasAttachment', 'http://sbols.org/v3#strategy', 'http://sbols.org/v3#template']

Note that this list is missing http://sbols.org/v3#hasMeasure and http://sbols.org/v3#hasVariableFeature because these are SBOL3 properties that hold child objects.

Also note that the visitor pattern might work for you. See https://pysbol3.readthedocs.io/en/v1.0b10/visitor.html for more information.

JMante1 commented 2 years ago

As Jake surmised we want to iterate over all properties for conversion to an excel template. We could do this using the code below so as to use the accessors and still be able to access an arbitrary list of properties. However, we would then miss the child object properties. To get the full list we could then do some kind of difference with the properties found and the _properties list?

for prop in obj.properties:
    prop_val = obj.get(prop)

@abamaj can you provide the file which you were using when the combinatorial derivation wasn't giving any properties?

tcmitchell commented 2 years ago

Would it be better to work in triple space in RDFLib rather than in SBOL3? You could iterate over the triples of an object there, entering information into Excel. This would be a "front door" approach. You can get an RDFLib graph from a document, then run SPARQL queries on it or access various triples programatically. pySBOL3 does this when loading a document from file or string.

abamaj commented 2 years ago

As Jake surmised we want to iterate over all properties for conversion to an excel template. We could do this using the code below so as to use the accessors and still be able to access an arbitrary list of properties. However, we would then miss the child object properties. To get the full list we could then do some kind of difference with the properties found and the _properties list?

for prop in obj.properties:
    prop_val = obj.get(prop)

@abamaj can you provide the file which you were using when the combinatorial derivation wasn't giving any properties?

@JMante1 @jakebeal @tcmitchell

Yes, here is the file that I used: https://github.com/SynBioDex/pySBOL3/blob/main/test/resources/simple_library.nt

tcmitchell commented 2 years ago

Thanks @abamaj for posting the file. I used it below to demonstrate that I can access the properties attribute of a combinatorial derivation. I'm doing some recap here because I don't know where things stand for the two of you.

@JMante1 said:

However in pysbol3 the object.properites appears to just be list and be absent from combinatorialderivation objects.

As I've said above, in pySBOL3 object.properties is a list. And that holds for all objects that derive from Identified in the SBOL3 specification. CombinatorialDerivation objects are included. I don't have any trouble accessing the properties list of a combinatorial derivation in https://github.com/SynBioDex/pySBOL3/blob/main/test/resources/simple_library.nt:

>>> import sbol3
>>> doc = sbol3.Document()
>>> doc.read('test/resources/simple_library.nt')
>>> len(doc)
67
>>> cd1 = doc.find('http://sbolstandard.org/testfiles/FPs_small')
>>> isinstance(cd1, sbol3.CombinatorialDerivation)
True
>>> cd1.properties
['http://sbols.org/v3#displayId', 'http://sbols.org/v3#name', 'http://sbols.org/v3#description', 'http://www.w3.org/ns/prov#wasDerivedFrom', 'http://www.w3.org/ns/prov#wasGeneratedBy', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 'http://sbols.org/v3#hasNamespace', 'http://sbols.org/v3#hasAttachment', 'http://sbols.org/v3#strategy', 'http://sbols.org/v3#template']

The code example listed by @JMante1 above was:

for prop in obj.properties:
    prop_val = obj.get(prop)

That code doesn't work because there is no get method defined on the SBOL3 objects. Here's what happens if I continue my example above to use the given code:

>>> obj = cd1
>>> for prop in obj.properties:
...     prop_val = obj.get(prop)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/Users/tmitchel/Projects/sd2/pySBOL3/sbol3/object.py", line 33, in __getattribute__
    result = super().__getattribute__(name)
AttributeError: 'CombinatorialDerivation' object has no attribute 'get'

Neither of you have responded to my suggestion to work in triple space in RDFLib. Have you considered that?

Or identify the object as a combinatorial derivation and operate on it as such using the provided accessors?

>>> cd1.variable_features
[<sbol3.varcomp.VariableFeature object at 0x7f82d06bceb0>]
>>>
>>> cd1.template
'http://sbolstandard.org/testfiles/FPs_small_template'
>>> cd1.template.lookup()
<sbol3.component.Component object at 0x7f82e02cfaf0>
abamaj commented 2 years ago

Thank you Tom for this information. After continued testing with SBOL3, @JMante1 and I realized that we will have to work in triple space in RDFLib, as it is more scalable. We will continue to update you with our progress.

tcmitchell commented 2 years ago

I'm assuming you're all set on this issue since there hasn't been any new activity in 2 months. If anything else crops up please either reopen this issue or open a new issue. I hope everything is going well on your project.

abamaj commented 2 years ago

I'm assuming you're all set on this issue since there hasn't been any new activity in 2 months. If anything else crops up please either reopen this issue or open a new issue. I hope everything is going well on your project.

Yes! The information you provided @JMante1 and I regarding RDFLib was extremely helpful. Thank you very much.