Are lots of sub-classes and sub-properties needed?

VladimirAlexiev commented 8 years ago

Consider are the consequences of making sub-classes and sub-properties. Economy of representation (number of triples) is an important consideration to keep NLP as RDF a feasible idea (because NLP generates a lot of data), and NIF 2 thought carefully about that (counting triples for the Simple, Stanbol and OpenAnnotation profiles). Injudicious use of sub-classes and sub-properties might induce NIF users to abandon RDFS... or NIF itself.

VladimirAlexiev commented 8 years ago

Eg currently there is:

itsrdf:taIdentRef rdfs:subPropertyOf nif:objectAnnotation
nif:oliaConf rdfs:domain nif:Annotation ; rdfs:subPropertyOf nif:confidenceCompanion

This means that

each taIdentRef will infer an extra property nif:objectAnnotation
each nif:oliaConf will infer type nif:Annotation for every word/phrase it's applied on; and extra property nif:confidenceCompanion

The question is whether people will want to query by these extra types/properties or not. If not, they are only a burden.

BTW, about complex class constructs such as Restrictions and unions, eg:

nif:oliaProv rdfs:range [ a owl:Class; owl:unionOf (prov:Activity prov:Agent ) ] ;

IMHO are not very useful: formally speaking, RDFS should infer this union as one of the types of every oliaProv object. I guess you use them to be able to use RDFUnit. But maybe it's better to use Shapes; or in the above case schema:domainIncludes / rangeIncludes.

kurzum commented 8 years ago

hm, I see your point. However, we would need a formal way of marking describing extensions of NIF somehow, especially, when it comes to external vocabs like itsrdf.

So let's do it like this:

if new properties are defined, they should be in the NIF namespaces such as nif:stem, nif:oliaLink, nif:opinion and included in nif:core
in addition we will produce extra files as in https://github.com/NLP2RDF/ontologies/tree/nif2.1/nif-module which contains the axioms, which lead to infer the extra triples. So whoever wants this info can use it in addition.

This will:

keep the triple count of inference low
enable user to brows nif-module folder to find out which properties should be used for what

kurzum commented 8 years ago

we can move the domain/range to the extra files

VladimirAlexiev commented 8 years ago

I like the approach: "if you want inference, load this extra file".

You had something like this in the old NIF: nif-core-inf.ttl and nif-core-val.ttl were separate (though nif-core-inf.ttl was used for more complex inferences, like Transitive and Restrictions; domain/range were in nif-core).

But more thinking is needed:

would this separation also apply to nif-core, eg "Word is a String"? I personally like it since most of the time I query by property, not by type. But some people might be unhappy about this.
I hope it won't mess your RDFUnit tests, these are very valuable. But, it's just a matter of loading the extra file.

BTW, we now see that "modules" can be created for different purposes: by feature (eg Annotation vs Translation), by function (eg definitions/comments vs domain/range).

So "module" and "namespace" are orthogonal concepts.

neradis commented 8 years ago

Are lots of sub-classes and sub-properties needed?

The answer will depend on the use case and the user of NIF. Some of these abstract super-properties and super-classes were introduced to express conceptual commonalities (conceptual interoperability), to allow for OWL reasoning/constraints or just to formalise expectation about the format of NIF documents. Abstract properties and types are helpful for exploratory queries von NIF data where one does not know beforehand which concrete annotation statement occur.

On the other hand there is indeed also the need for triple-economy when trying to achieve larger volumes of NIF data and some potential users (esp. the ones with no prior Sem. Web background) not interested in OWL-benefits would certainly also welcome a pure RDF(S) version of NIF without the conceptual overhead of OWL.

I think it's feasible (probably not even much effort) to write some code that could down-grade NIF OWL schema documents to two RDF(S) documents:

a very basic version that is just declaring all concrete classes and properties, together with the annotations for humans (rdfs:label, rdfs:comments) -> the minimal-overhead version
a pure RDFS version that basically keeps all RDFS axioms from the OWL version and only contains a small choice of abstract superclasses that (e.g. nif:String)

All three versions could (and probably should) use the same namespace and the down-graded versions would just be a subset of the RDFS inference closure over the OWL version. Offering consistent versions of per module directly side be side might be easier for users than import declarations (which are only part of OWL anyway).

This would not only allow to circumvent unwanted reasoning bloat (although one usually has control over the entailment regime applied by stores/tools anyway), but also offer NIF newcomers a stepstone, allowing them to adopt NIF only with knowledge about RDF(S).

VladimirAlexiev commented 8 years ago

I like this idea of "profiles".

I've done something similar for https://github.com/erlangen-crm/ecrm using this script https://github.com/erlangen-crm/ecrm/blob/master/ecrm-simplify.xq

NLP2RDF / ontologies

Are lots of sub-classes and sub-properties needed? #17