dice-group / sask

Projectgroups Search and Extraction
GNU Affero General Public License v3.0
2 stars 10 forks source link

Sparql data stored by database-ms has subject, predicate and object as URL's ( Edit: Issue Description Updated) #60

Open prasanthhs opened 6 years ago

prasanthhs commented 6 years ago

Are there any extractors which index labels against URL's in the sparql db from the current code? I have tried open IE MS and it returns URL's for all the three fields of triples. I am not sure about other extractors.

We had a discussion with Ricardo about this and he mentioned that Auto Index will index URL vs Label and not URL vs URL.

Some sample FOX answers (shared by Ricardo):

Query : Barack Obama is born in Hawaii FOX_output_for__Barack_Obama_is_born_in_Hawaii__.txt

Currently, the extractors store

foxr:1530176121568  a  oa:Annotation , rdf:Statement , foxo:Relation ;
        rdf:object     dbr:Hawaii ;
        rdf:predicate  foxo:stanford_livein ;
        rdf:subject    dbr:Barack_Obama ;

but should also create this: dbr:Barack_Obama rdfs:label "Barack Obama" .

Finally, Label vs URL has to be indexed in Auto Index and not URL vs URL.

Any suggestions on how to proceed further?

[Additional Information] : The following Query must work against database-ms's sparql db

"SELECT DISTINCT ?key1 ?key2 WHERE{ \n \ ?key1 a owl:Thing . ?key1 rdfs:label ?key2 .}". If yes, then isEntityCustomized and EntitySelectQuery can be omitted from the parameters passed and just end point URL would suffice.

[EDIT 05.07.2018] : Current Query which works with Auto Index and is passed by database-ms to AutoIndex.

SELECT DISTINCT ?key1 ?key2 WHERE{ ?key1 rdfs:label ?key2 . } Current Tested Extractor : FOX.

EXPECTED Query to be passed to Auto Index:

"SELECT DISTINCT ?key1 ?key2 WHERE{ \n ?key1 a owl:Thing . ?key1 rdfs:label ?key2 .}".

KHaack commented 6 years ago

Hmm one possible solution is to use jena to parse the data. The second is, that we parse every output of any extractor before we insert the data into the database or select an other output format if possible (and its sometimes not possible)

RicardoUsbeck commented 6 years ago

Parsing it into a Jena Model plus running a query. Otherwise too much data ends up as garbage in the score. That should also concern @hjshah142

sepidetari commented 6 years ago

ok so you need something like this?
subject (as uri) -----predicate (as label)-----> object (as uri)

RicardoUsbeck commented 6 years ago

From "Barack Obama is born in Hawaii" the following should be sent to the triple store:

dbr:Barack_Obama foxo:stanford_livein dbr:Hawaii.
dbr:Barack_Obama rdfs:label "Barack Obama" .
dbr:Hawaii rdfs:label "Hawaii" .
foxo:stanford_livein rdfs:label "born in".
sepidetari commented 6 years ago

@RicardoUsbeck Does It mean that we just have to store turtle format in DB?
We parsed the ttl into Jena model and stored it in DB, the following pictures are the result of it. Is this fine? db 1 db 2

Suganya31 commented 6 years ago

@RicardoUsbeck @prasanthhs If this looks fine, we can try making the open IE and Sorookin to give output in the form of TTL.

RicardoUsbeck commented 6 years ago

Would be great. Is that also possible for FOX so that harsh has three annotators for the ensemble learning?

prasanthhs commented 6 years ago

@Suganya Does this work against the Sparql Query I posted? If it doesn't return anything, please let me know. We need to check how we can get the data in this case.

Query is as below, add prefix for owl and rdfs before this query. This query is same as what is executed on dbpedia and a few other sparql end points.

"SELECT DISTINCT ?key1 ?key2 WHERE{ \n ?key1 a owl:Thing . ?key1 rdfs:label ?key2 .}".

RicardoUsbeck commented 6 years ago

No it does not since there is no triple that can bind to ?key1 a owl:Thing .

SELECT DISTINCT ?key1 ?key2 
WHERE{
?key1 rdfs:label ?key2 .
FILTER(regex(str(?key1), "/ontology/[a-z]" ))}

This one should work and return labels for propertie for example

prasanthhs commented 6 years ago

@RicardoUsbeck yes i know. But the problem is that that query is a generic default query which runs for all remote end points.. Should i make the query you've given the default instead for only local sparql end points? What about queries for classes and properties which are also executed against this local database since it was decided that we keep generic behaviour for both remote and local end points.

Plus there's the case of missing Prefixes. In the extractor's data there are prefixes which are not included by default with auto Index. Isn't it better if the query is passed instead with necessary prefixes ( opposite to what we discussed) ?

Suganya31 commented 6 years ago

@RicardoUsbeck I am not sure about FOX but I can try Sorookin and Open IE. @KHaack Is it possible for FOX also?

RicardoUsbeck commented 6 years ago

@prasanthhs the problem is, that the SASK knowledge graph is not complete but yes, you can also configure it so the query is passed to the autoindex

prasanthhs commented 6 years ago

@RicardoUsbeck Yes I understand..Then, it is probably better for database-ms to pass the new query posted here instead of configuring auto index since this is a sask related work in progress. Once the complete implementation is done, the custom queries can be removed.

Suganya31 commented 6 years ago

Fox stores the proper TTL in the database now.

prasanthhs commented 6 years ago

Snapshot of Elastic Search Repository for Data extracted by FOX extractor + Auto Index integration.

Input to the extractor : "Barack Obama is married to Michelle Obama."

Query received by Auto Index for the Sparql End point: SELECT DISTINCT ?key1 ?key2 WHERE{?key1 rdfs:label ?key2 . screen shot 2018-07-03 at 13 03 56

RicardoUsbeck commented 6 years ago

If we could get the label of the property in the future it would be even better, but currently FOX does not return them. Maybe @Suganya31 can open an issue for that