ExPaNDS-eu / ExPaNDS-experimental-techniques-ontology

EU Photon and Neutron Ontologies (task 3.2)
8 stars 4 forks source link

Wrong subclasses of "photon and neutron technique" #95

Open paulmillar opened 1 year ago

paulmillar commented 1 year ago

Currently, all experimental techniques are defined as RDFS Classes; e.g.,

<owl:Class rdf:about="http://purl.org/pan-science/PaNET/PaNET01188">
    <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/CHMO_0000182"/>
    <rdfs:subClassOf rdf:resource="http://purl.org/pan-science/PaNET/PaNET00001"/>
    <rdfs:subClassOf rdf:resource="http://purl.org/pan-science/PaNET/PaNET00002"/>
    <rdfs:subClassOf rdf:resource="http://purl.org/pan-science/PaNET/PaNET00003"/>
    <rdfs:subClassOf rdf:resource="http://purl.org/pan-science/PaNET/PaNET00100"/>
    <rdfs:subClassOf rdf:resource="http://purl.org/pan-science/PaNET/PaNET00200"/>
    <rdfs:subClassOf rdf:resource="http://purl.org/pan-science/PaNET/PaNET01012"/>
    <rdfs:subClassOf rdf:resource="http://purl.org/pan-science/PaNET/PaNET01020"/>
    <rdfs:subClassOf rdf:resource="http://purl.org/pan-science/PaNET/PaNET01022"/>
    <rdfs:subClassOf rdf:resource="http://purl.org/pan-science/PaNET/PaNET01037"/>
    <rdfs:subClassOf rdf:resource="http://purl.org/pan-science/PaNET/PaNET01124"/>
    <rdfs:subClassOf rdf:resource="https://www.wikidata.org/wiki/Q133900"/>
    <obo:IAO_0000119 rdf:datatype="http://www.w3.org/2001/XMLSchema#string">https://en.wikipedia.org/wiki/Small-angle_X-ray_scattering</obo:IAO_0000119>
    <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">small angle x-ray scattering</rdfs:label>
    <skos:altLabel rdf:datatype="http://www.w3.org/2001/XMLSchema#string">SAXS</skos:altLabel>
</owl:Class>

The top-level class (defined in PaNET) is photon and neutron technique:

    <owl:Class rdf:about="http://purl.org/pan-science/PaNET/PaNET00001">
        <rdfs:subClassOf rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
        <obo:IAO_0000115 rdf:datatype="http://www.w3.org/2001/XMLSchema#string">A technique in the domain of neutron, muon and accelerator-based light sources</obo:IAO_0000115>
        <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">photon and neutron technique</rdfs:label>
    </owl:Class>

There are four immediate subclasses of photon and neutron technique:

The problem is that none of these are actually techniques, not in any meaningful way.

There is no technique that is meaningfully described by "defined by experimental probe".

This is a problem in the following use-case:

Some user-facing software provides a UI that allows users to select techniques (perhaps one technique, perhaps multiple techniques). Examples of such user-facing systems include a proposal submission systems, user surveys on public data usage, and data curation services.

One might reasonably expect that all subclasses of photon and neutron technique are valid techniques. Selecting any subclass should be accepted as reasonable input. The UI might build a list of acceptable values (based on this assumption) and limit the user to one of those acceptable values.

The underlying point here is that not all the terms in PaNET describe experimental techniques: some provide ways to group together related techniques. This is perfectly reasonable but, with the current structure, this distinction is not visible.

Proposed solution

  1. Create a new top-level class called grouping criteria (or something equivalent).
  2. Add assertions that defined by experimental probe, defined by experimental physical process, defined by functional dependence, defined by purpose are each subclasses of grouping criteria.
  3. Remove assertions that defined by experimental probe, defined by experimental physical process, defined by functional dependence, defined by purpose are subclasses of photon and neutron technique
  4. Identify the top-level experimental techniques:
    1. Identify the set of all "actual" experimental techniques: those terms that might be used to describe how data was obtained,
    2. of these, identify all experimental techniques that are not a subclass of some other experimental technique.
  5. For each top-level experimental technique, add assertion that they are a subclass of photon and neutron technique.

There is a subtly here.

The decision on whether a PaNET term is an "actual" experimental technique is somewhat subjective. I would say this question is really this: are there circumstances where we anticipate tagging a dataset with this term, when identifying how data was obtained?

For example, PaNET00101 ("neutron probe") makes perfect sense as a subclass of PaNET00002 ("defined by experimental probe"). Theoretically, a dataset could be tagged with this term, indicating only that neutrons were the probe particle with no other details. However, I would say this is wrong. Instead, a more specific term should be used, which would be a subclass of PaNET00002.

The same argument applies to many subclasses of PaNET00101 ("neutron probe"). It doesn't make sense to tag dataset as PaNET01016 ("thermal neutron probe") or PaNET01017 ("cold neutron probe").

However, I would say it would be reasonable to tag a dataset with PaNET01100 ("neutron powder diffraction"). This term is also a subclass of PaNET00101 ("neutron probe").

The choice of could also change over time, as we gain experience.

spc93 commented 1 year ago

Quick comment: I always considered 'defined by experimental probe' to be shorthand for 'technique that is defined by experimental probe'. I agree that it is not such an obvious thing to want to search for but I don't see a problem with the logic. For example, 'neutron scattering' and 'x-ray spectroscopy' should be subclasses of this, whereas 'imaging' should not.

paulmillar commented 1 year ago

Thanks for the feedback.

What you say certainly makes perfect sense: defined by experimental probe is really "technique that is defined by experimental probe". Such a term could make sense when searching for techniques, although I think it could still be better phrased ("technique with a probe particle" perhaps?)

However, I'm talking about choosing use-cases: when a human is asked to select a specific technique; e.g.,

I would argue that PaNET should support these choosing use-cases (how else is PaNET actually used?) and that PaNET currently doesn't do a good job.

Under these "choosing" use-case, I would say that answering defined by experimental probe makes absolutely no sense. Offering this possibility to a user is just plain wrong.

Another way of looking at the same issue: are there are classes in PaNET that exist to support searching but should NOT be used when tagging a dataset?

As a concrete example: is it correct to tag a dataset defined by experimental probe (and nothing else)? I don't think so.

spc93 commented 1 year ago

I take your point. To take an extreme case, knowing that a technique is a subclass of 'Thing' is possibly not helpful for a practical search.

paulmillar commented 1 year ago

At the risk of labouring the point: I'm not talking about search use-cases. I'm not talking about users asking a question like "what data exists that was obtained using technique X?"

I'm talking about a human choosing an individual technique. For example, a user is submitting a (beam-time) proposal and says that they want to do "SAXS" (or "serial crystallography" or ...).

I'm saying this selection should use PaNET, but that the ontology is a bit broken at the moment, as it offers (what I think are) nonsense values for this purpose.

spc93 commented 1 year ago

Yes this is something that needs looking at. I think it is important to remember that the techniques are all classes rather than individuals. Some (technique) classes are identifiable (by humans) with specific techniques and other are not. I don't think it is broken but maybe needs something additional for this use-case. Whether it is an additional property in the ontology, or something that is dealt with at the implementation stage, should be discussed.

gkoum commented 1 year ago

Regarding the following:

https://en.wikipedia.org/wiki/Small-angle_X-ray_scattering small angle x-ray scattering SAXS

Declaring the above is unnecessary because in ontology modeling, subsumption (subclass relationships) is transitive. If a is a subclass of b and b is a subclass of c, then it is logically implied that a is also a subclass of c. Therefore, there's no need for an explicit assertion stating a: subclass of c because it follows logically from the existing subclass relationships. But I cannot find the above in the ontology itself so maybe it is a result of some reasoning tool (inferred).

Should we need to have these categories out of the class hierarchy this could be achieved by adding them as Categories and connecting categories to techniques with an object property like belongsToCategory. In that way we can still have all the techniques belonging to each category while removing abstract classes from the hierarchy tree. It would also help to have a schema that is more Ontology than a Taxonomy since we add objectProperties.

So I would suggest the following modifications:

  1. Add top level class _TechniquesCategories and not grouping criteria because the second has no semantic load.
  2. Add all abstract classes that can be used only as Categories
  3. Add an objectProperty belongsToCategory to photon and neutron technique class.
  4. Assign a category to any concrete technique keeping the rest of the hierarchy as described in @paulmillar initial comment.

The above is also visible by incorporating a Class like AbstractTechnique that can be excluded in certain queries but remain as techniques in case it is hard to distinguish between Techniques and Categories.

paulmillar commented 1 year ago

Declaring the above is unnecessary because in ontology modeling, subsumption (subclass relationships) is transitive.

Indeed.

But I cannot find the above in the ontology itself so maybe it is a result of some reasoning tool (inferred).

Correct, these were added as part of the "build" process.

Unfortunately, the version in git was built manually and in (what I believe was) a somewhat inconsistent fashion. IIRC, the version of the ontology uploaded to Zenodo doesn't have inferred axioms.

This inconsistency is one of the reasons I created a script to automate building PaNET. Currently, this build script adds all inferred axioms to be (more) consistent with what is currently available.

We have talked about removing these interred axioms in the past. IIRC, the agreement was that we should do it.

Should we need to have these categories out of the class hierarchy this could be achieved by adding them as Categories and connecting categories to techniques with an object property like belongsToCategory. In that way we can still have all the techniques belonging to each category while removing abstract classes from the hierarchy tree. It would also help to have a schema that is more Ontology than a Taxonomy since we add objectProperties.

The current design is (more or less) that each class represents a category of techniques. For each technique, it's possible to add a new (more specialised) version of that technique by introducing a subclass.

Datasets are tagged with a PaNET term are identifying that dataset with a PaNET class: a category of techniques, but one that is as specific as possible (at the time of tagging the dataset).

In this sense, each technique is already a category, with subsumption used to indicate the belongsToCategory relationship.

What you propose could be a significant shift away from this current design (as always, the devil is in the detail). My feeling is that, with such a change, we should bump the first version digit (result in a PaNET v2.0.0). Therefore, we should tread carefully and ensure we understand what problems the change would address and that we don't break existing use-cases.

Just to be clear, I'm not disagreeing with you!

However, I think there's some more "low hanging fruit" problems that should be fixed before making this kind of change.

  1. Add an objectProperty /belongsToCategory/ to photon and neutron technique class.

I think this is possibly a cause of confusion: PaNET (itself) doesn't have any objects. It does have some OWL named individuals, but their inclusion is "wrong" (see #52).

gkoum commented 1 year ago

I agree that the suggestion can be a big change and we should first discuss several implications. I generally find the existing structure comprehensible enough as a taxonomy that can easily be presented in bioportal tree representation. Any of the suggested approaches is applicable and mainly depends on the way the ontology is going to be used. Building an ontology always begins from a taxonomy and then gradually depending on the goals of the ontology dataType and object properties are introduced to bind classes and create vocabularies that can later be used to reason and infer new knowledge. I also agree that the PANET should not include individuals since if we need to use it we can separate Tbpx from Abox as I will do in the ontology I will prepare for ESRF by importing PANET. My suggestion was an attempt to improve the initial suggestion which also brings some big changes to the structure of the ontology. Lets discuss this in the coming meeting to better understand the proposals and their implications.