Closed RicardoUsbeck closed 6 years ago
@RicardoUsbeck will try to reproduce this today
@RicardoUsbeck ok, could reproduce it easily. As far as I see, docker container works fine and loads the index. And since english index works just fine - maybe german the index is broken?
Hi,
the point is that when you run docker pull aksw/agdistis
, the container considers the pre-defined agdistis.properties which means when you change the KB, you have to change the agdistis.properties file as well. In the respective fields below:
Pre-defined:
nodeType=http://dbpedia.org/resource/
edgeType=http://dbpedia.org/ontology/
baseURI =http://dbpedia.org
new KB:
nodeType=http://de.dbpedia.org/resource/
edgeType=http://de.dbpedia.org/ontology/
baseURI =http://de.dbpedia.org
It will also happen in case you change drastically the KB i.e., URI, e.g.,Wikidata. We can figure out a solution to deal with it.
@DiegoMoussallem can you change it so that those vars are read from environment variables? (and if those are not set - fall back to properties file)
@yamalight I intend to do it because it is also related to the challenge of dealing with more than one nodeType. But, I have to think how it will affect the performance of graph-based algorithms.
@DiegoMoussallem why would that affect performance in any way? it's not that different to reading those values from file 🤔
@yamalight Because we boost those algorithms using the node and edge types. Also, there are other variables, like context, acronyms, threshold, commonEntities, endpoint, coreference which can vary significantly. I'm not sure how feasible it would be if we put everything as variables in terms of running. If you have any idea just tell me :)
@yamalight I think I got what you meant, was it to insert the values while it is running?
@DiegoMoussallem yes, instead of reading from file just use e.g. System.getenv("AGDISTIS_NODE_TYPE")
I got that, sorry, I got confused with another term. Would it be better for running a docker?
@DiegoMoussallem yep. then you can basically provide those vars on container start, which'd make configuration much simpler
Perfect, I'm gonna provide it, but first I'll think about the usage of more than node and edge types. I then improve everything.
@RicardoUsbeck Could you create your own docker image for a while? -> https://github.com/dice-group/AGDISTIS/wiki/3-Running-the-webservice#create-your-own-docker-image
It is not for me, but for a colleague. That means, we should first fix it for one node type and close this issue. Than open another issue for multiple node types.
@yamalight @RicardoUsbeck I was wondering if there is a better solution than reading from environment variables. Because the user will have to insert the values all the time when he runs a new instance because sometimes the user just wants to analyze some behaviors and not keep a given instance running forever.
@DiegoMoussallem isn't it exactly the same with editing config?
The thing is, that the config file is compiled into the jar which we copy into the container. That is, we actually would need to have one docker image per language
@yamalight No, It is not. At least I don't think so because when a user wants to change one of the parameters using editing config, he just changes it and compiles again maintaining the other values. By environment variables, the user has to insert all values again.
@DiegoMoussallem @RicardoUsbeck I originally reported the issue directly to Ricardo. At Ricardo's suggestion I followed the instructions on the Wiki to run my own webservice (i.e. running the Java directly), and Diego's suggested changes to the agdistis.properties file (nodeType, edgeType, baseURI) and was able to retrieve results for German. Initially I missed the need to set those three properties - my mistake. Thank you both very much for your continues assistance and patience with this issue.
@DiegoMoussallem are you sure we're talking about the same env vars? if you use config - you need to recompile the whole thing. if you use env vars - you just need to restart the app, right? that's like 10x easier
edit: I am not seeing the issue you keep bringing up with using env vars. can you please explain your concerns in details?
@lguillou Cool, we hope you like what you see. We will update the webservices soon. @yamalight Can you please paste a command for running the docker with env vars so Diego sees that it is actually quite easy and we do not need 3 images but only one?
@RicardoUsbeck @DiegoMoussallem sure. Here's how you would configure AGDISTIS via env vars:
docker run -d \
-e AGDISTIS_NODE_TYPE=http://de.dbpedia.org/resource/ \
-e AGDISTIS_EDGE_TYPE=http://de.dbpedia.org/ontology/ \
-e AGDISTIS_BASE_URI=http://de.dbpedia.org \
-p 8080:8080 \
-v `pwd`/index:/usr/local/tomcat/index \
aksw/agdistis
AGDISTIS_NODE_TYPE=http://de.dbpedia.org/resource/ \
AGDISTIS_EDGE_TYPE=http://de.dbpedia.org/ontology/ \
AGDISTIS_BASE_URI=http://de.dbpedia.org \
mvn tomcat:run
No need to re-compile and change anything in the app.
@yamalight @RicardoUsbeck Guys, I know what you are saying and we are talking about the same thing for sure. I was just wondering if a person who does not use docker and is improving Agdistis if it would be good. There was not the necessity to past code here about env variables.
Today we have more than 10 "variables", that's why I said that I was wondering if there was a better solution, not because I don't know what env variables are.
index=indexdbpedia_en_2016
index2=index_bycontext_en
nodeType=http://dbpedia.org/resource/
edgeType=http://dbpedia.org/ontology/
baseURI =http://dbpedia.org
endpoint=https://query.wikidata.org/sparql
ngramDistance=3
maxDepth=2
threshholdTrigram=0.87
heuristicExpansionOn=true
whiteList=/config/whiteList.txt
corporationAffixes=/config/corporationAffixes.txt
popularity=true
algorithm=hits
context=true
acronym=true
commonEntities=false
folderWithTTLFiles=/Users/diegomoussallem/Desktop/AGDISTIS/data/2016-04/en/context/
surfaceFormTSV=/Users/diegomoussallem/Desktop/AGDISTIS/data/2016-04/en/surface/en_surface_forms.tsv
@DiegoMoussallem @RicardoUsbeck 1) 10 vars is really not that much. I've seen apps that have 30+ config options using env vars and it was ok, nobody died :) 2) it's also fine from UX perspective, they're not that hard to set. 3) if you want better UX - you can move config to an external file (e.g. yaml, toml, etc) that is re-read on changes
IMO ability to config using env vars should still be there, because using config files in docker is not too convenient (and can bring additional pains when deploying multiple AGDISTIS nodes to swarm)
Okay, so in summary:
Thus, I suggest to leave this issue open and open a new issue implementing properties as environment variables for both: running docker and running from mvn. Do you agree @DiegoMoussallem and @yamalight ?
@RicardoUsbeck sounds good to me
@RicardoUsbeck should I just do it? That'll probably take ~5 mins
@yamalight feel free.
@RicardoUsbeck @DiegoMoussallem PR created. Please check the code/style/etc and merge if everything's good.
Running that branch with following config:
docker run -d \
--name agdistis \
-v `pwd`/indexdbpedia_de_2016:/usr/local/tomcat/index \
-p 8080:8080 \
-e AGDISTIS_NODE_TYPE=http://de.dbpedia.org/resource/ \
-e AGDISTIS_EDGE_TYPE=http://de.dbpedia.org/ontology/ \
-e AGDISTIS_BASE_URI=http://de.dbpedia.org \
agdistis:env
And requesting query from @RicardoUsbeck with curl:
curl --data-urlencode "text='<entity>Angela Merkel</entity>.'" -d type='agdistis' http://localhost:8080/AGDISTIS
Now works as expected.
I will check in the upcoming days. Thanks so much!
I did a new clone and tested it for English
mvn package -Dmaven.test.skip=true
docker build -t aksw/agdistis .
wget http://hobbitdata.informatik.uni-leipzig.de/agdistis/dbpedia_index_2016-04/en/indexdbpedia_en_2016.zip
unzip indexdbpedia_en_2016.zip
docker run -d \
--name agdistis \
-v `pwd`/indexdbpedia_en_2016:/usr/local/tomcat/index \
-p 8080:8080 \
-e AGDISTIS_NODE_TYPE=http://dbpedia.org/resource/ \
-e AGDISTIS_EDGE_TYPE=http://dbpedia.org/ontology/ \
-e AGDISTIS_BASE_URI=http://dbpedia.org \
aksw/agdistis:latest
curl --data-urlencode "text='<entity>Angela Merkel</entity>.'" -d type='agdistis' http://localhost:8080/AGDISTIS
and it worked. Then I tried another language to see the environment variables working:
docker stop agdistis
wget http://hobbitdata.informatik.uni-leipzig.de/agdistis/dbpedia_index_2016-04/de/indexdbpedia_de_2016.zip
unzip indexdbpedia_de_2016.zip
docker run -d \
--name agdistis \
-v `pwd`/indexdbpedia_de_2016:/usr/local/tomcat/index \
-p 8080:8080 \
-e AGDISTIS_NODE_TYPE=http://de.dbpedia.org/resource/ \
-e AGDISTIS_EDGE_TYPE=http://de.dbpedia.org/ontology/ \
-e AGDISTIS_BASE_URI=http://de.dbpedia.org \
aksw/agdistis:latest
curl --data-urlencode "text='<entity>Angela Merkel</entity>.'" -d type='agdistis' http://localhost:8080/AGDISTIS
I followed the "running with docker" instructions on the following wiki page: https://github.com/dice-group/AGDISTIS/wiki/3-Running-the-webservice
I downloaded and unpacked the following German DBPedia file: dbpedia_index_2016-04/de/indexdbpedia_de_2016.zip and renamed the folder to "index".
I then ran the following commands:
When I query the webservice using the following command:
curl --data-urlencode "text='<entity>Angela Merkel</entity>.'" -d type='agdistis' http://localhost:8080/AGDISTIS
I receive what look to be dummy URLs, suggesting the entity cannot be found in the index:
[{"disambiguatedURL":"http:\/\/aksw.org\/notInWiki\/AngelaMerkel","offset":13,"namedEntity":"Angela Merkel","start":1}]
I tried a number of entities that I would expect to return a URL, all without success. I receive the same dummy URL results if I point the docker container at the 2014 German index. However, I can return results for English (using the 2016 and 2014 indices):
[{"disambiguatedURL":"http:\/\/dbpedia.org\/resource\/Angela_Merkel","offset":13,"namedEntity":"Angela Merkel","start":1}]
Are you able to advise whether there is a problem/difference with the German index, or perhaps with my execution of the steps for German?
(assigned also to @mejohnee)