This folder contains guidelines and materials for the Open Knowledge Extraction challenge at ESWC2015. The OKE challenge consists of three tasks, participants can choose to compete in one of more of them. Each task will have a separate dataset. All sentences in the three datasets are of encyclopedic nature. The majority of sentences are descriptive of with respect to one main entity, e.g. if the main entity is a person the sentence will be of biographic nature.
The example data used in the following description of tasks is available in folder example_data
The Gold Standard data is available in folder GoldStandard_sampleData
Participants must:
Entity Recognition, Linking and Typing for Knowledge Base population.
This task consists of (i) identifying Entities in a sentence and create an OWL individual (owl:Individual statement) representing it, (ii) link (owl:sameAs statement) such individual, when possible, to a reference KB (DBpedia) and (iii) assigning a type to such individual (rdf:type statement) selected from a set of given types.
In this task by Entity we mean any discourse referent (the actors and objects around which a story unfolds), either named or anonymous that is an individual of one of the following DOLCE Ultra Lite classes:
Entities also include anaphorically related discourse referents. Hence, anaphora resolution has to be take into account for addressing the task.
As an example, for the sentence:
Florence May Harding studied at a school in Sydney, and with Douglas Robert Dundas , but in effect had no formal training in either botany or art.
we want the system to recognize four entities:
Recognized Entity | generated URI | Type | SameAs |
---|---|---|---|
Florence May Harding | oke:Florence_May_Harding | dul:Person | dbpedia:Florence_May_Harding |
school | oke:School | dul:Organization | |
Sydney | oke:Sydney | dul:Place | dbpedia:Sydney |
Douglas Robert Dundas | oke:Douglas_Robert_Dundas | dul:Person |
The results must be provided in NIF format, including the offsets of recognized entities. The expected output for the example sentence can be found in task1.ttl.
In the above example we use
@prefix oke: <http://www.ontologydesignpatterns.org/data/oke-challenge/task-1/>
We will evaluate three aspects on this task, independently:
We will calculate Precision, recall and F1 for the three subtasks and the winner for task 1 will be the system with higher average F1 for all three.
Class Induction and entity typing for Vocabulary and Knowledge Base enrichment.
This task consists in producing rdf:type statements, given definition texts. The participants will be given a dataset of sentences, each defining an entity (known a priori), e.g. the entity: dpedia:Skara_Cathedral and its definition "Skara Cathedral is a church in the Swedish city of Skara".
Participants are expected to (i) identify the type(s) of the given entity as they are expressed in the given definition, (ii) create a owl:Class statement for defining each of them as a new class in the target knowledge base, (iii) create a rdf:type statement between the given entity and the new created classes, and (iv) align the identified types, if a correct alignment is available, to a set of given types.
In the task we will evaluate the extraction of all strings describing a type and the alignment to any of the subset of DOLCE+DnS Ultra Lite classes
As an example, for the sentence:
Brian Banner is a fictional villain from the Marvel Comics Universe created by Bill Mantlo and Mike Mignola and first appearing in print in late 1985..
Brian Banner will be given as the input target entity. We want the system to recognize any possible type for it. Correct answers include:
Recognized string for the type | Generated Type | rsubClassOf |
---|---|---|
fictional villain | oke:FictionalVillain | dul:Personification |
villain | oke:Villain | dul:Person |
The results must be provided in NIF format, including the offsets of recognized string describing the type. The expected output for the example sentence can be found in task2.ttl.
In the above example we use
@prefix oke: <http://www.ontologydesignpatterns.org/data/oke-challenge/task-2/>
We will evaluate two aspects on this task, independently:
We will calculate Precision, recall and F1 for the two subtasks and the winner for task 2 will be the system with higher average F1 for the two of them.
Relation extraction and naming, and triple generation for Ontology and Knwoledge Base enrichment.
The participants will be given as input a sentence and two entities contained in the sentence. The task consists in (i) assessing whether the sentence contains an evidence of a relation between the two input entities and if true (ii) the creation of a OWL property representing the relation, including a value for its rdf:label annotation statement, and (iii) the production of a statement for the relation.
The triple must be of the form
The participants are required to produce a label for the relation, using the rdfs:label statement. The label should include the portion of text that expresses the relation.
For all examples of task 3 we use
@prefix oke: <http://www.ontologydesignpatterns.org/data/oke-challenge/task-3/>
As an example, for the sentence:
In 1956 Coleman moved to Chicago, along with Booker Little, where he worked with Gene Ammons and Johnny Griffin before joining Max Roach Quintet 1958-1959.
We will give as input the two entities
The system is expected to identify that the text expresses a relation between the two and to produce a statement such as
oke:workedWith
a owl:ObjectProperty ;
rdfs:label "worked with"@en ;
dc:relation oke:69_80_workedWith .
The results must be provided in NIF format, including the offsets of recognized string(s) describing the relation. The expected output for the example sentence can be found in task3.ttl.
If the strings expressing the relation in the text are not contiguous, participants can return multiple offset statements.
For example, for the sentence:
Wayne Koestenbaum (born 1958) is an American poet and cultural critic. He received a B.A. from Harvard University, an M.A. from Johns Hopkins University, and a Ph.D. from Princeton University.
where the two given input entities are:
We expect the participants
oke:receivedphdfrom
a owl:ObjectProperty ;
rdfs:label "received a Ph.D. from"@en ;
dc:relation oke:74_82_received, oke:60_170_phd_from .
We will evaluate two aspects on this task, independently:
The winner for task 3 will be the system with higher linear combination of the score for the two subtasks.
Participating systems have been evaluated on the Evaluation data available in folder evaluation-data
For Task 1 the participants are:
For Task 2 the participants are:
Task 1
Annotator | Micro F1 | Micro Precision | Micro Recall | Macro F1 | Macro Precision | Macro Recall |
---|---|---|---|---|---|---|
Adel | 0.6075 | 0.6938 | 0.5403 | 0.6039 | 0.685 | 0.54 |
FOX | 0.4988 | 0.6639 | 0.4099 | 0.4807 | 0.6329 | 0.4138 |
FRED | 0.3473 | 0.4667 | 0.2766 | 0.2278 | 0.3061 | 0.1814 |
Task 2
Annotator | Micro F1 | Micro Precision | Micro Recall | Macro F1 | Macro Precision | Macro Recall |
---|---|---|---|---|---|---|
CETUS | 0.4735 | 0.4455 | 0.5203 | 0,4478 | 0.4182 | 0.5328 |
OAK@Sheffield | 0.4416 | 0.5155 | 0.39 | 0.3939 | 0.3965 | 0.3981 |
FRED | 0.3043 | 0.2893 | 0.3211 | 0.2746 | 0.2569 | 0.3173 |
Powered by
Task 1
Task 2