dice-group / gerbil

GERBIL - General Entity annotatoR Benchmark
GNU Affero General Public License v3.0
224 stars 58 forks source link

topical tags for NIF documents #7

Closed MichaelRoeder closed 10 years ago

MichaelRoeder commented 10 years ago

Find a solution to add topical tags to a NIF document (needed for C2W, Rc2W and Sc2W)

MichaelRoeder commented 10 years ago

Additionally it would be good to be able to add a confidence score to the single topical tags. (needed for Rc2W and Sc2W)

der-bruemmer commented 10 years ago

How do we want to do this? We could use blank nodes:

http://example.org#ex1 a nif:Context; nif:topic [ a nif:Annotation; rdfs:label "$topicAnnotation"; nif:confidence "0.2" ] .

or assign urns, as suggested in the iswc nif 2013 paper

http://example.org#ex1 a nif:Context; nif:topic urn:annotation1 . urn:annotation1 a nif:Annotation; rdfs:label "$topicAnnotation"; nif:confidence "0.2".

Other suggestions?

dcherix commented 10 years ago

I vote for the second proposition, avoid blank nodes is ever the better solution ;)

RicardoUsbeck commented 10 years ago

Since URN are a subgroup of URIs I would also suggest to use URNs.

The only thing that puzzles me right now is, why we do not use lists in RDF instead of blank node/URN concept.

http://en.wikipedia.org/wiki/Uniform_resource_name

der-bruemmer commented 10 years ago

I'm not very familiar with using RDF lists for statement annotation. Can you explain how that would work?

Anyway, I will define nif:topic, nif:keyword, nif:confidence and nif:Annotation in the nif core ontology. I will not yet define the ranges, so that we have some room for different modelling scenarios.

MichaelRoeder commented 10 years ago

Please don't use lists, because they would make it more complicated and in the end we would need more triples to describe the same.

Let's use URNs and two triples for two annotations:

http://example.org#ex1 a nif:Context . http://example.org#ex1 nif:topic urn:annotation1 . http://example.org#ex1 nif:topic urn:annotation2 .

urn:annotation1 a nif:Annotation . urn:annotation1 rdfs:label "$topicAnnotation" . urn:annotation1 nif:confidence "0.2".

urn:annotation2 a nif:Annotation . urn:annotation2 rdfs:label "$topicAnnotation" . urn:annotation2 nif:confidence "0.9".

RicardoUsbeck commented 10 years ago

Agreed

der-bruemmer commented 10 years ago

Defined and published the following nif properties:

nif:topic a owl:DatatypeProperty ; owl:versionInfo "0.0.1" ; rdfs:label "topic" ; rdfs:comment """The topic of a string"""@en ; rdfs:domain nif:String ; rdfs:range nif:Annotation .

nif:keyword a owl:DatatypeProperty ; owl:versionInfo "0.0.1" ; rdfs:label "keyword" ; rdfs:comment """A general keyword associated with a stringy"""@en ; rdfs:domain nif:String ; rdfs:range xsd:string .

nif:confidence a owl:DatatypeProperty ; owl:versionInfo "0.0.1" ; rdfs:label "confidence of annotation" ; rdfs:comment """The confidence of an annotation as decimal between 0 and 1"""@en ; rdfs:domain nif:Annotation ; rdfs:range xsd:decimal .

nif:Annotation a owl:Class ; owl:versionInfo "0.0.1" ; rdfs:label "Annotation" ; rdfs:comment """Individuals of this class are annotations of strings. This class can be used if an annotation statement has to be annotated with further information, like confidence or annotation provenance (like which tool produced the annotation)."""@en .

rtroncy commented 10 years ago

Fresh from the [http://www.w3.org/annotation/](W3C Web Annotations) working group meeting held at TPAC, we discussed NIF and other robust-anchoring methods to annotate texts, see the http://www.w3.org/2014/10/28-annotation-minutes.html#item05.

I would recommend to _NOT_ create a nif:Annotation class but to use the Open Annotation core data model.

RicardoUsbeck commented 10 years ago

Oh cool! We definitely need to discuss this to avoid reinventing the wheel. Thanks for the pointer! It will be a good starting point for the monday meeting :)

rtroncy commented 10 years ago

What is the Monday meeting? Yes, do not re-invent the wheel, the Open Annotation model is made exactly for such a use case

kurzum commented 10 years ago

Hi @rtroncy, nif:Annotation is rdfs:subclassOf oa:Annotation and oa:Body It is actually used in Stanbol quite a lot, see: http://stanbol.apache.org/docs/trunk/components/enhancer/engines/nif20 . Some NLP tools even consider segmentation an annotation, nif:String will also be an Annotation, i.e. a TextAnnotation.

OA is good, but there is no way that it scales to NLP. Especially, the difference between body and annotation is often not practical (albeit the "correct" way to do it). We will join the Annotation Working Group soon.

For now: If in a graph there is only one alternative, we will use the categories used in this standard: http://www.w3.org/TR/its20/#basic-concepts-datacategories and its ontology http://www.w3.org/2005/11/its/rdf#

If you have alternative annotations then we can switch to the stanbol profile. OA's separation of Annotation and Body is not commonly accepted in NLP.

kurzum commented 10 years ago

Some innovation is necessary, however, I think, that creating own vocabulary and providing a mapping is the best way to go, although it causes some redundancy, but hey, why do we have owl:equivalentClass and such....

RicardoUsbeck commented 10 years ago

@rtroncy sorry for misleading information here on the "monday" meeting. It is just a loose meeting on Leipzig developers to push the next goals of GERBIL. So nothing to attend mandantory

der-bruemmer commented 10 years ago

The problem with OpenAnnotation is, like @kurzum mentioned, that the oa:Body further complicates our annotation. Our proposed solution

http://example.org#ex1 a nif:Context . http://example.org#ex1 nif:topic urn:annotation1 .

urn:annotation1 a nif:Annotation . urn:annotation1 rdfs:label "$topicAnnotation" . urn:annotation1 nif:confidence "0.2".

using OpenAnnotation, would look like (please correct me if I'm wrong, losely following this example: http://www.w3.org/community/openannotation/wiki/SE_Free_text_tagging_a_Image )

http://example.org#ex1 a nif:Context . http://example.org#ex1 nif:topic urn:annotation1 .

urn:annotation1 a oa:Annotation . urn:annotation1 oa:hasTarget http://example.org#ex1 . urn:annotation1 oa:hasBody urn:annotation1Body . urn:annotation1 oa:motivatedBy oa:tagging . urn:annotation1 nif:confidence "0.2".

urn:annotation1Body a oa:Tag . urn:annotation1Body a cnt:ContentAsText . urn:annotation1Body cnt:chars "$topicAnnotation" .

Separating the body and the annotation itself (in my opinion) overcomplicates the annotation, makes it harder to understand the output to users not familiar with oa and doubles the triples needed per annotation. What is gained would be a more universal data model. I just don't see what the universality contributes to this use case.

rtroncy commented 10 years ago

@der-bruemmer @kurzum You're referring to old deprecated examples of OA so I'm not sure I understand the issue.

First, where is the latest version of the owl file defining NIF? @kurzum provides statements such as nif:Annotation rdfs:subclassOf oa:Annotation which is not present in this comment

Second, the last resolution of the WG is that OA will accept very simple oa:hasBody such as strings.

I personally direct the W3C WG to use the String ontology as one of the robust anchoring methods for annotating texts.

der-bruemmer commented 10 years ago

@rtroncy Can you point me to more current openannotation documentation? I was using http://www.openannotation.org/ and got to the examples from there. I'm not aware of any new versions. Even removing the special body class and motivatedBy, this seems more complicated than simply adding the $topicAnnotation to the annotation resource.

http://example.org#ex1 a nif:Context . http://example.org#ex1 nif:topic urn:annotation1 .

urn:annotation1 a oa:Annotation . urn:annotation1 oa:hasTarget http://example.org#ex1 . urn:annotation1 oa:hasBody urn:annotation1Body . urn:annotation1 nif:confidence "0.2".

urn:annotation1Body a oa:Tag . urn:annotation1Body nif:topicAnnotation "$topicAnnotation" .

If that example does not reflect oa correctly, please modify it accordingly.

Regarding the NIF OWL file, nif:Annotation is in an alpha status, so it might become subClassOf oa:Annotation in the process. However, we would not include an annotation body so it would not be really compliant with oa anyway.

RicardoUsbeck commented 10 years ago

In this version, we will use NIF since we are more experienced with it. Please everybody, feel free to write your own outpout method for oa in the controllers so the output can be shown additionally. Thanks for the fruitful discussion. @der-bruemmer please document at https://github.com/AKSW/gerbil/wiki/How-to-generate-a-NIF-dataset

rtroncy commented 10 years ago

The Open Annotation core data model was just the first input of the new Web Annotations Working Group which can change everything, including the name of the properties. Having said this, this model is recognized to be pretty good already. If you want to be up-to-date, the best way is 1/ to follow the conversation on the mailing list, or 2/ to ask to people who follow (like me) and 3/ to read the latest version of the specification. You're lucky, there is a new one: http://w3c.github.io/web-annotation/model_fpwd/

The new version enables to have simple body, see this example, in annotations. Motivation has always been optional.

I'm happy to help you but I don't understand what you still find too complicated. Please, explain me.