Closed ghost closed 7 years ago
Hi, thanks for pen-testing our algorithm. First, I miss an ontologies file. AGDISTIS needs to construct a graph, i.e. you must have triples of the form (URI, predicate, URI), e.g. mappingbased_objects.ttl from DBpedia. Second, I will try out your code and see, why it is returning a string and not a URI. Third, you should be able to extend the index but we might need to adapt it to write on an existing index.
Thanks for your reply, @RicardoUsbeck .
Could you kindly explain what ontologies file you mean? What contents does it need to have? Generally, what are the minimum required files and triples respectively to get a custom index up an running? I am completely new to the semantic web science, so please forgive my lack of knowledge.
I created a mappingbased_properties_en.ttl file like in the 2014 dbpedia dataset with the following content:
...
<http://www.technologyreview.com/resource/602283> <http://xmlns.com/foaf/0.1/name> "QuantumComputer"@en .
<http://www.technologyreview.com/resource/602283> <http://www.technologyreview.com/ontology/field> <http://www.technologyreview.com/resource/computing> .
...
I also changed the edgeType property to edgeType=http://www.technologyreview.com/ontology/
and built a new index.
Unfortunately, the overall behaviour did not change. I still receive only the label string.
Edit: I tried to completely simulate the dbpeda 2014 index by creating all the files it also has (of course with less entires) and only exchanged the dbpedia URI inside. It still does not work. Is there a minimum amount of triples necessary maybe?
Regarding index expansion: I would even be happy, if you could provide me a hint where to adapt the code, so I can try it on my own.
Just to put things into perspective: My use case is to have a small index with a domain-specific ontology. I want to disambiguate on that index. And if I find new entities with Stanford CoreNLP, I want to add those new entities and their properties to the index.
I have made a new observation.
I switched back to the default DBpedia 2016 index, but forgot to change my AGDISTIS properties file. So the entries for nodeType
, edgeType
and baseURI
still were http://example.com/resource/
etc. instead of http://dbpedia.org/resource/
and so on.
With these wrong properties, I get the same error as before - the returned URI is just the label/name string of the entity, although I am using the default DBpedia index.
But I am quite sure, that my previous properties have been correct, because as I said in the edited paragraph of my previous post, I just exchanged the "dbpedia.org" string with "example.com" in the properties file and .ttl files, as well.
Does this observation help to solve the problem?
I will take a look at your data today or Monday. Could you please upload you example files?
Sure, you will find them here: https://drive.google.com/drive/folders/0BycW_RxvAHdzZkhmS19vTGdFT1k?usp=sharing
The provided properties file is from my second test.
Thank you very much for your time!
Hi @RicardoUsbeck ,
I really do not want to rush you, but have you already looked at the data or the problem, respectively?
@Phauly1, @RicardoUsbeck is dealing with other things regarding AGDISTIS. So, I'm here to help you. The problem was once AGDISTIS provides a list for avoiding bad URIs or for filtering named entities instead of collection common entities, you forgot to include the types inside whileList. You have two options or you comment the respective line inside CandidateUtil.java(line 152) or you include the types within the whiteList's file. In the upcoming release of AGDISTIS, we set it as a parameter not a list anymore. Therefore, we hope to avoid this kind of problem in the future. In addition, I created a test class for your case. https://hastebin.com/onurozijen.vbs . Let me known if you need something else otherwise, I will close this issue.
Hi @DiegoMoussallem ,
I have added the two new instance types to the whiteList.txt and your test method is working. That is awesome, thank you!
But I have a few other related questions and it would be great if you can answer them.
Do I have to strictly follow the DBpedia conventions when creating a new index? So, do I need all the .ttl files like disambiguations.ttl
, redirects_transitive_en.ttl
and so on or are labels_en.ttl
and instance_types_en.ttl
enough? Can I even create my own .ttl files with custom properties/predicates?
@RicardoUsbeck said that it might be necessary to adapt the AGDISTIS code in order to extend an already existing index with new triples (at runtime). How could I do that? Otherwise, is it possible to run AGDISTIS with two separate indexes in parallel?
Thank you!
@Phauly1 nice you have tried and it has worked with you!
1 - It is not exactly the DBpedia's convention, it is structured data and graphs. redirects_transitive_en.ttl and disambiguation are important. For instance, disambiguation file allows dealing with a very ambiguity mentions like "German" http://dbpedia.org/page/German. So AGDISTIS can go through all entities and say which one is the correct. Also, redirects_transitives enables to walk more optimized in the graph avoiding incorrect entities. So. I would suggest you have a look at Knowledge Graphs literature for a good understanding.
2 - if you wish to run two indexes in parallel and each one comes from in a knowledge base i.e a different graph. For instance, YAGO and DBpedia. I would suggest you comment that line(Whitelist). Also, you would have to create another index parameter e.g index2=indexbyPhaulh1 along with an appropriated java code for it just replicating.
I hope I have answered your questions.
Thanks for the solution Diego.
@Phauly1 For 1) the main important thing is that it is a well structured graph and you have enough surface forms, i.e. rdfs:label properties for each entity. For 2) additionally to the suggestion by Diego, you could implement and test a method to include new triples at runtime. However, I am not available until March for such implementations. Feel free to do so and come back for questions.
Thank you for your answers. That helps me a lot.
For now, I will try the following:
I am going to use the standard DBpedia index as it is. Then I will create a second custom index with my own ontology and try to use both in parallel. And if there are new entities found by CoreNLP, they will be added to the custom index. In that way, AGDISTIS should disambiguate on two different indexes and only return an entity from my custom index, if it is not found in DBpedia.
(@DiegoMoussallem Could you explain in little more detail what you mean with: "Also, you would have to create another index parameter e.g index2=indexbyPhaulh1 along with an appropriated java code for it just replicating."?)
I will use this github issue for further questions and really appreciate your support so far.
Hi @Phauly1, I meant you have to create in the agdistis.property file another line pointing to the new index directory. Also, to create another TripleIndex.java unless you maintain the same structure of DBpedia Index, it is not necessary.
Hi everyone,
right now I am trying to get AGDISTIS to work with an index different from dbpedia. For test purposes, I have created a tiny custom index and run AGDISTIS on it. But it does not return a proper URI of the disambiguated entity. It just returns the text/label of the entity instead.
My approach so far:
labels_en.ttl
,instance_types_en.ttl
anden_surface_forms.tsv
. I have oriented myself to your DBpedia 2014 index example from the wiki. They look like this:labels_en.ttl:
<http://www.technologyreview.com/s/602283> <http://www.w3.org/2000/01/rdf-schema#label> "QuantumComputer"@en .
instance_types_en.ttl:
<http://www.technologyreview.com/s/602283> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/InformationAppliance> .
en_surface_forms.tsv:
http://www.technologyreview.com/s/602283 Quantum Computer Computer
Run
mvn exec:java -Dexec.mainClass="org.aksw.agdistis.util.TripleIndexCreator"
to create the actual index.Modify properties file:
Run AGDISTIS (mergeT branch) with the following code (it gets the entity labels and positions from the Stanford CoreNLP MentionsAnnotator):
Now instead of returning
QuantumComputer -> http://www.technologyreview.com/s/602283
it returnsQuantumComputer -> QuantumComputer
.Is this an issue with my custom index? Because if I use the 2016 dbpedia standard index my implementation is working.
I would be very happy if you could provide an explanation of how to use a custom index that is a little bit more detailed than in the GitHub wiki. :-)
Thank you in advance!
PS: In case the new custom index will be working in the future, how can I add new Triples to the already existing index? With the addDocumentToIndex method in TripleIndexCreator.java?