freme-project / freme-ner

Apache License 2.0
6 stars 1 forks source link

Linking part: some entities are never filtered #61

Closed borriellom closed 8 years ago

borriellom commented 8 years ago

Some entities (always the same) are never filtered, whatever types or domain I specify. I used this text misc.txt containig text from different Wikipedia pages. Find below two example of requests and results.

Request 1

curl -X POST --header "Content-type: " -d @misc.txt "http://api-dev.freme-project.eu/current/e-entity/freme-ner/documents/informat=text&outformat=turtle&language=en&dataset=dbpedia&types=http%3A%2F%2Fdbpedia.org%2Fontology%2FColour"

Response 1

@prefix dbpedia-fr: <http://fr.dbpedia.org/resource/> .
@prefix dbc:   <http://dbpedia.org/resource/Category:> .
@prefix dbpedia-es: <http://es.dbpedia.org/resource/> .
@prefix xsd:   <http://www.w3.org/2001/XMLSchema#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix dbpedia: <http://dbpedia.org/resource/> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix nif:   <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix dbpedia-de: <http://de.dbpedia.org/resource/> .
@prefix dbpedia-ru: <http://ru.dbpedia.org/resource/> .
@prefix freme-onto: <http://freme-project.eu/ns#> .
@prefix dbpedia-nl: <http://nl.dbpedia.org/resource/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix dbpedia-it: <http://it.dbpedia.org/resource/> .

<http://freme-project.eu/#char=1121,1144>
        a                     nif:Phrase , nif:RFC5147String , nif:Word , nif:String ;
        nif:anchorOf          "Armando Maradona Franco"^^xsd:string ;
        nif:beginIndex        "1121"^^xsd:int ;
        nif:endIndex          "1144"^^xsd:int ;
        nif:referenceContext  <http://freme-project.eu/#char=0,4478> ;
        itsrdf:taClassRef     <http://nerd.eurecom.fr/ontology#Person> ;
        itsrdf:taConfidence   "0.9933652520988838"^^xsd:double .

<http://freme-project.eu/#char=2486,2502>
        a                     nif:Phrase , nif:Word , nif:RFC5147String , nif:String ;
        nif:anchorOf          "The Snake 's Pass"^^xsd:string ;
        nif:beginIndex        "2486"^^xsd:int ;
        nif:endIndex          "2502"^^xsd:int ;
        nif:referenceContext  <http://freme-project.eu/#char=0,4478> ;
        itsrdf:taClassRef     <http://www.w3.org/2002/07/owl#Thing> ;
        itsrdf:taConfidence   "0.9772895752960493"^^xsd:double .

<http://freme-project.eu/#char=1626,1641>
        a                     nif:Word , nif:String , nif:RFC5147String , nif:Phrase ;
        nif:anchorOf          "El Pibe de Oro ''"^^xsd:string ;
        nif:beginIndex        "1626"^^xsd:int ;
        nif:endIndex          "1641"^^xsd:int ;
        nif:referenceContext  <http://freme-project.eu/#char=0,4478> ;
        itsrdf:taClassRef     <http://nerd.eurecom.fr/ontology#Person> ;
        itsrdf:taConfidence   "0.899075112103839"^^xsd:double .

<http://freme-project.eu/#char=0,4478>
        a               nif:String , nif:Context , nif:RFC5147String ;
        nif:beginIndex  "0"^^xsd:int ;
        nif:endIndex    "4478"^^xsd:int ;
        nif:isString    "Prince Antonio Griffo Focas Flavio Angelo Ducas Comneno Porfirogenito Gagliardi De Curtis di Bisanzio, best known by his stag
e name Totò[1] (Italian pronunciation: [toˈtɔ]; 15 February 1898 – 15 April 1967) or as Antonio De Curtis, and nicknamed il Principe della risata
 (\"the Prince of laughter\"), was an Italian comedian, film and theatre actor, writer, singer and songwriter. He is widely considered one of the grea
test Italian artists of the 20th century.[2]While he first gained his popularity as a comic actor, his dramatic roles, his poetry, and his songs are a
ll deemed to be outstanding; his style and a number of his recurring jokes and gestures have become universally knownmemes in Italy. Writer and philos
opher Umberto Eco has thus commented on the importance of Tot├▓ in Italian culture:In this globalized universe where it seems that everybody's watchin
g the same movies and eating the same food, there are still abysmal and overwhelming fractures separating one culture from another. How can two people
s [i.e. the Chinese and the Italian], one of which unknowing of Tot├▓, truly understand each other?Diego Armando Maradona Franco (Spanish pronunciatio
n: [ˈdjeɣo maɾaˈðona], born 30 October 1960) is a retired Argentine professional footballer. He has served as a manager and coach at other clubs
as well as the national team of Argentina. Many experts, football critics, former players, current players and football fans regard Maradona as the gr
eatest football player of all time.[5][6][7][8] He was joint FIFA Player of the 20th Century with Pelé.[9][10] A precocious talent, Maradona was give
n the nickname \"El Pibe de Oro\" (\"The Golden Boy\"), a name that stuck with him throughout his career.[12] Maradona played in four FIFA World Cups,
 including the 1986 World Cup in Mexico where he captained Argentina and led them to victory over West Germany in the final, and won the Golden Ball a
s the tournament's best player.Abraham \"Bram\" Stoker (8 November 1847 ΓÇô 20 April 1912) was an Irish author, best known today for his 1897 Gothic n
ovel, Dracula. During his lifetime, he was better known as the personal assistant of actor Henry Irving and business manager of the Lyceum Theatre in
London, which Irving owned.Stoker visited the English town of Whitby in 1890 and that visit is said to be part of the inspiration of his great novel D
racula. While manager for Henry Irving and secretary and director of London's Lyceum Theatre, he began writing novels, beginning with The Snake's Pass
 in 1890 and Dracula in 1897. During this period, Stoker was part of the literary staff of the The Daily Telegraph in London, and wrote other fiction,
 including the horror novels The Lady of the Shroud (1909) and The Lair of the White Worm (1911).[8] In 1906, after Irving's death, he published his P
ersonal Reminiscences of Henry Irving, which proved successful,[5] and managed productions at the Prince of Wales Theatre.Before writing Dracula Stoke
r met Ármin Vámbéry, a Hungarian writer and traveler. Dracula likely emerged from Vámbéry's dark stories of the Carpathian mountains.[9] Stoker t
hen spent several years researching European folklore and mythological stories of vampires.Quentin Jerome Tarantino[1] (/ˌtærənˈtiːnoʊ/; born Ma
rch 27, 1963) is an American filmmaker and actor. His films are characterized by non-linear storylines, satirical subject matter, an aestheticization
of violence, utilization of ensemble casts, references to pop culture, their soundtracks, and features of neo-noir film. Its popularity was boosted by
 his second film, Pulp Fiction (1994), a black-comedy crime film that was a major success both among critics and audiences. Judged the greatest film f
rom 1983ΓÇô2008 by Entertainment Weekly,[2] many critics and scholars have named it one of the most significant works of modern cinema.[3] For his nex
t effort, Tarantino paid homage to the blaxploitation films of the 1970s with Jackie Brown (1997), an adaptation of the novel Rum Punch. He has receiv
ed many industry awards, including two Academy Awards, two Golden Globe Awards, two BAFTA Awards and the Palme d'Or, and has been nominated for an Emm
y and a Grammy. He was named one of the 100 Most Influential People in the World by Time in 2005.[4] Filmmaker and historian Peter Bogdanovich has cal
led him \"the single most influential director of his generation\".[5] In December 2015, Tarantino received a star on the Hollywood Walk of Fame for h
is contributions to the film industry.[6]"^^xsd:string .

<http://freme-project.eu/#char=7,101>
        a                     nif:Phrase , nif:Word , nif:RFC5147String , nif:String ;
        nif:anchorOf          "Antonio Griffo Focas Flavio Angelo Ducas Comneno Porfirogenito Gagliardi De Curtis di Bisanzio"^^xsd:string ;
        nif:beginIndex        "7"^^xsd:int ;
        nif:endIndex          "101"^^xsd:int ;
        nif:referenceContext  <http://freme-project.eu/#char=0,4478> ;
        itsrdf:taClassRef     <http://nerd.eurecom.fr/ontology#Person> ;
        itsrdf:taConfidence   "0.9935898957094552"^^xsd:double .

Request 2

curl -X POST --header "Content-type: " -d @misc.txt "http://api-dev.freme-project.eu/current/e-entity/freme-ner/documents/?
informat=text&outformat=turtle&language=en&dataset=dbpedia&domain=TaaS-0200"

Response 2

@prefix dbpedia-fr: <http://fr.dbpedia.org/resource/> .
@prefix dbc:   <http://dbpedia.org/resource/Category:> .
@prefix dbpedia-es: <http://es.dbpedia.org/resource/> .
@prefix xsd:   <http://www.w3.org/2001/XMLSchema#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix dbpedia: <http://dbpedia.org/resource/> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix nif:   <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix dbpedia-de: <http://de.dbpedia.org/resource/> .
@prefix dbpedia-ru: <http://ru.dbpedia.org/resource/> .
@prefix freme-onto: <http://freme-project.eu/ns#> .
@prefix dbpedia-nl: <http://nl.dbpedia.org/resource/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix dbpedia-it: <http://it.dbpedia.org/resource/> .

<http://freme-project.eu/#char=1121,1144>
        a                     nif:Phrase , nif:RFC5147String , nif:Word , nif:String ;
        nif:anchorOf          "Armando Maradona Franco"^^xsd:string ;
        nif:beginIndex        "1121"^^xsd:int ;
        nif:endIndex          "1144"^^xsd:int ;
        nif:referenceContext  <http://freme-project.eu/#char=0,4478> ;
        itsrdf:taClassRef     <http://nerd.eurecom.fr/ontology#Person> ;
        itsrdf:taConfidence   "0.9933652520988838"^^xsd:double .

<http://freme-project.eu/#char=2486,2502>
        a                     nif:Phrase , nif:Word , nif:RFC5147String , nif:String ;
        nif:anchorOf          "The Snake 's Pass"^^xsd:string ;
        nif:beginIndex        "2486"^^xsd:int ;
        nif:endIndex          "2502"^^xsd:int ;
        nif:referenceContext  <http://freme-project.eu/#char=0,4478> ;
        itsrdf:taClassRef     <http://www.w3.org/2002/07/owl#Thing> ;
        itsrdf:taConfidence   "0.9772895752960493"^^xsd:double .

<http://freme-project.eu/#char=1626,1641>
        a                     nif:Word , nif:String , nif:RFC5147String , nif:Phrase ;
        nif:anchorOf          "El Pibe de Oro ''"^^xsd:string ;
        nif:beginIndex        "1626"^^xsd:int ;
        nif:endIndex          "1641"^^xsd:int ;
        nif:referenceContext  <http://freme-project.eu/#char=0,4478> ;
        itsrdf:taClassRef     <http://nerd.eurecom.fr/ontology#Person> ;
        itsrdf:taConfidence   "0.899075112103839"^^xsd:double .

<http://freme-project.eu/#char=0,4478>
        a               nif:String , nif:Context , nif:RFC5147String ;
        nif:beginIndex  "0"^^xsd:int ;
        nif:endIndex    "4478"^^xsd:int ;
        nif:isString    "Prince Antonio Griffo Focas Flavio Angelo Ducas Comneno Porfirogenito Gagliardi De Curtis di Bisanzio, best known by his stag
e name Totò[1] (Italian pronunciation: [toˈtɔ]; 15 February 1898 – 15 April 1967) or as Antonio De Curtis, and nicknamed il Principe della risata
 (\"the Prince of laughter\"), was an Italian comedian, film and theatre actor, writer, singer and songwriter. He is widely considered one of the grea
test Italian artists of the 20th century.[2]While he first gained his popularity as a comic actor, his dramatic roles, his poetry, and his songs are a
ll deemed to be outstanding; his style and a number of his recurring jokes and gestures have become universally knownmemes in Italy. Writer and philos
opher Umberto Eco has thus commented on the importance of Tot├▓ in Italian culture:In this globalized universe where it seems that everybody's watchin
g the same movies and eating the same food, there are still abysmal and overwhelming fractures separating one culture from another. How can two people
s [i.e. the Chinese and the Italian], one of which unknowing of Tot├▓, truly understand each other?Diego Armando Maradona Franco (Spanish pronunciatio
n: [ˈdjeɣo maɾaˈðona], born 30 October 1960) is a retired Argentine professional footballer. He has served as a manager and coach at other clubs
as well as the national team of Argentina. Many experts, football critics, former players, current players and football fans regard Maradona as the gr
eatest football player of all time.[5][6][7][8] He was joint FIFA Player of the 20th Century with Pelé.[9][10] A precocious talent, Maradona was give
n the nickname \"El Pibe de Oro\" (\"The Golden Boy\"), a name that stuck with him throughout his career.[12] Maradona played in four FIFA World Cups,
 including the 1986 World Cup in Mexico where he captained Argentina and led them to victory over West Germany in the final, and won the Golden Ball a
s the tournament's best player.Abraham \"Bram\" Stoker (8 November 1847 ΓÇô 20 April 1912) was an Irish author, best known today for his 1897 Gothic n
ovel, Dracula. During his lifetime, he was better known as the personal assistant of actor Henry Irving and business manager of the Lyceum Theatre in
London, which Irving owned.Stoker visited the English town of Whitby in 1890 and that visit is said to be part of the inspiration of his great novel D
racula. While manager for Henry Irving and secretary and director of London's Lyceum Theatre, he began writing novels, beginning with The Snake's Pass
 in 1890 and Dracula in 1897. During this period, Stoker was part of the literary staff of the The Daily Telegraph in London, and wrote other fiction,
 including the horror novels The Lady of the Shroud (1909) and The Lair of the White Worm (1911).[8] In 1906, after Irving's death, he published his P
ersonal Reminiscences of Henry Irving, which proved successful,[5] and managed productions at the Prince of Wales Theatre.Before writing Dracula Stoke
r met Ármin Vámbéry, a Hungarian writer and traveler. Dracula likely emerged from Vámbéry's dark stories of the Carpathian mountains.[9] Stoker t
hen spent several years researching European folklore and mythological stories of vampires.Quentin Jerome Tarantino[1] (/ˌtærənˈtiːnoʊ/; born Ma
rch 27, 1963) is an American filmmaker and actor. His films are characterized by non-linear storylines, satirical subject matter, an aestheticization
of violence, utilization of ensemble casts, references to pop culture, their soundtracks, and features of neo-noir film. Its popularity was boosted by
 his second film, Pulp Fiction (1994), a black-comedy crime film that was a major success both among critics and audiences. Judged the greatest film f
rom 1983ΓÇô2008 by Entertainment Weekly,[2] many critics and scholars have named it one of the most significant works of modern cinema.[3] For his nex
t effort, Tarantino paid homage to the blaxploitation films of the 1970s with Jackie Brown (1997), an adaptation of the novel Rum Punch. He has receiv
ed many industry awards, including two Academy Awards, two Golden Globe Awards, two BAFTA Awards and the Palme d'Or, and has been nominated for an Emm
y and a Grammy. He was named one of the 100 Most Influential People in the World by Time in 2005.[4] Filmmaker and historian Peter Bogdanovich has cal
led him \"the single most influential director of his generation\".[5] In December 2015, Tarantino received a star on the Hollywood Walk of Fame for h
is contributions to the film industry.[6]"^^xsd:string .

<http://freme-project.eu/#char=7,101>
        a                     nif:Phrase , nif:Word , nif:RFC5147String , nif:String ;
        nif:anchorOf          "Antonio Griffo Focas Flavio Angelo Ducas Comneno Porfirogenito Gagliardi De Curtis di Bisanzio"^^xsd:string ;
        nif:beginIndex        "7"^^xsd:int ;
        nif:endIndex          "101"^^xsd:int ;
        nif:referenceContext  <http://freme-project.eu/#char=0,4478> ;
        itsrdf:taClassRef     <http://nerd.eurecom.fr/ontology#Person> ;
        itsrdf:taConfidence   "0.9935898957094552"^^xsd:double .
nilesh-c commented 8 years ago

This is expected. Currently type filtering is done using DBpedia types. I have attached the current domains.csv here. It is a configuration file that contains the domain code in the first column followed by a comma-separated list of corresponding DBpedia types.

The problem with the above resources is that they are not linked/disambiguated into any KB URI and therefore don't have sufficient type information for filtering.

domains.csv.txt

m1ci commented 8 years ago

Nilesh's comment solves the issue.