Relation data transformation - algorithm

Data transformations (obo:OBI_0200000) are based on algorithms (obo:IAO_0000064). This relations is not very straight forward but expressed by the following restriction with class obo:OBI_0200000 ('Data transformation'):

<owl:Class rdf:about="http://purl.obolibrary.org/obo/OBI_0200000"><!-- data transformation -->
        <rdfs:label xml:lang="en">data transformation</rdfs:label>
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/BFO_0000055"/><!-- realizes -->
                <owl:someValuesFrom>
                    <owl:Restriction>
                        <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0000059"/><!-- concretizes -->
                        <owl:someValuesFrom>
                            <owl:Class>
                                <owl:intersectionOf rdf:parseType="Collection">
                                    <rdf:Description rdf:about="http://purl.obolibrary.org/obo/IAO_0000064"/><!-- algorithm -->
                                    <owl:Restriction>
                                        <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/BFO_0000051"/><!-- has part -->
                                        <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/IAO_0000005"/><!-- objective specification -->
                                    </owl:Restriction>
                                </owl:intersectionOf>
                            </owl:Class>
                        </owl:someValuesFrom>
                    </owl:Restriction>
                </owl:someValuesFrom>
            </owl:Restriction>
        </rdfs:subClassOf>
</owl:Class>

In this definition, the two object properties obo:BFO_0000055 ('realizes') and obo:RO_0000059 ('concretizes') follow each other without explicitly specifying the intermediate range class of 'realizes' (or the domain class of 'concretizes'):

realizes some
    (concretizes some
        (algorithm and ('has part' some 'objective specification'))
    )

The range of object property obo:BFO_0000055 ('realizes') is class obo:BFO_0000017 ('Realizable entity'), so - according to my interpretation - the whole construction looks like this:

rdfbones-o_datatransformations_fullgraph

But what is the exact purpose of the realisable entity put between the algorithm and the data transformation process?

I would imagine that it relates an abstract algorithm to a specific use case, like the generic definition of a portion (a/b) to the calculation of a disease prevalence (C/n). In this case, it would further specify the algorithm's variables.

Would such an element belong to one of the subclasses of class obo:BFO_0000017 ('Realizable entity'), and to which of them. The subclasses are the following:

obo:BFO_0000016 ('Disposition')
- obo:BFO_0000034 ('Function')
obo:OBI_0000260 ('Plan')
obo:BFO_0000023 ('Role')

According to Arp & Smith 2011 and Arp et al. 2015 (98-103), it cannot be a disposition (or function) because if the concretisation of the algorithm ceases to exist, the algorithm is not physically (or structurally) changed. Defining it as a 'role' would underline the concretisation's character as applying the abstract algorithm to a specific use case. A definition as 'plan', on the other hand, would stress its purpose to give directions on how the data transformation is to be carried out.

Do you have any thoughts on this?

References

Arp R, Smith B. 2011. Realizable Entities in Basic Formal Ontology. Online resource. http://ontology.buffalo.edu/smith/articles/realizables.pdf.

Arp R, Smith B, Spear AD. 2015. Building Ontologies with Basic Formal Ontology. Cambridge etc.: MIT Press.

See discussion at https://groups.google.com/d/msg/obi-developer/HnYiqbYRh7c/pHlEpMFGBAAJ

You can think of the algorithm as something that can both be written down, as well as be the basis of a plan you when you sit down and write some software. You have the idea of the plan before you have the computer run some software implementation. It is possible that you have the plan but something gets in the way, so the process (running the software) might not even happen, or the same software run many times. The "plan" is the realizable entity, loosely speaking.

The relation of realizable to process is 0:many

Most of the time we don't have much to say about the realizable, and in stores you are querying there might not even be an instance of the realizable, depending on the kind of reasoner you are using. For example Pellet's implementation of SPARQL allows a bnode in a SPARQL query to bind to an existentially implied individual (or at least it used to - haven't exercised it recently).

The relation between data transformation, algorithm and software, that @alanruttenberg, brings up is relevant to our project. We would like our users to be able to write R packages for querying data through a SPARQL endpoint, using the SPARQL package. These packages would include functions for manipulating RDFBones data.

A function from such an R package would be represented by the class obo:IAO_0000591 ('Software method'). @zarquon42b and I still find it hard to model our use case, involving the realisable entity requested in the restriction with class obo:OBI_0200000 ('Data transformation'). A straightforward solution to what we have in mind would be something like this:

rdfbones-o_datatransformations_softwarescenarioa
Figure: Scenario a).

We are not quite sure which properties from the OBI should be used instead of the properties :implements, :usesAlgorithm and :usesSoftware, which are introduced here. Class obo:IAO_0000010 ('Software') and its subclasses do not seem to be involved in any restrictions.

Another solution, incorporating the notion of the realisable entity as a plan for writing software, as brought up by @alanruttenberg above, would be this:

rdfbones-o_datatransformations_softwarescenariob
Figure: Scenario b).

Here, the realisable entity would take on a function similar to a plan specification - which it is not in terms of the OBI. On the other hand, this structure would be compatible with OBI specificationsI.

Which scenario (a or b) is more in line with the OBI? And how are software specifications properly expressed?

I forgot to mention that, internally, we use dashed arrow lines to indicate rdf-schema:subClassOf relations (cf. legend for network graphs).

What would you lose if you didn't use the classes software method and algorithm? Can what you want to achieve be done by simply subclassing data transformation? Have a look at the subclasses of data transformation in OBI. You will see terms such as variance calculation. The definition describes the algorithm, but we don't need a separate algorithm or software entity to say that a calculation of variance has been done.

Presumably there are other questions you need to answer for your software to work that my simple model won't. If you can give me some of these competency questions it would be easier to assess what you need to say.

I like the fact that you have a legend for your diagrams. One thing that I think can confuse is that the meaning of an object property link when between two classes is interpreted differently than when between two instances. In the above consider software plan and software method. The implements link isn't a link in the RDF graph. A relation between classes expresses (typically) an all-some relationship: All software methods implement some plan. In the RDF serialization of OWL one would express that as a restrictions, which constitute several nodes and links.

software method rdfs:subClassOf :restriction :restriction rdf:Type owl:Restriction :restriction owl:onProperty implements :restriction owl:someValuesFrom software plan

When the relation is between two instances the translation to RDF is a simple triple instance1 :implements instance2

That drawings don't make plain that the arrows function differently in these cases typically leads to confusion.

What I would recommend you do is draw a picture which is primarily the instances needed to satisfy some competency question, annotating the instances with their class.

Do you have somewhere a description of the overall aims of your project including, ideally, some concrete use cases?

@alanruttenberg Thank you very much for your input. Sorry for coming back to it so late.

Concerning the 'algorithm' and 'software method' classes: We can, for the time being, do without them. We are just curious how these things are meant to work in the OBI. For now, we are just figuring out how to model our research data in general. Software integration would come in a second step, when we will deal more closely with use cases. In this context, the following issues might be of interest:

We see a potential in users writing R packages that pull data through SPARQL queries and perform the data transformations.
Users should be able to reference software packages and their versions in their research, just as they would do in the methodology section of their articles.

In view of these potential further developments, we would like to understand what we are leaving out now.

Concerning use cases and competency questions: We have a list of use cases and competency questions but they still need to be elaborated. At the moment we are lacking the work force to do this. For a follow-up project, we will incorporate use cases and competency questions much earlier in the planning phase.

For a general project overview, please see or project website. This could do with a lot of updating, though.

Concerning graph represenation: Thank you very much for your input. Representing restrictions with arrow relations helps me to understand how external ontologies are meant to work. But the similarity to instance relations is, indeed, irritating to others. We should gradually change to a representation style as you suggest.

RDFBones / RDFBones-O

Relation data transformation - algorithm #70

References