dice-group / AGDISTIS

AGDISTIS - Agnostic Named Entity Disambiguation
http://aksw.org/Projects/AGDISTIS.html
GNU Affero General Public License v3.0
141 stars 37 forks source link

Environment variables instead of properties #52

Closed RicardoUsbeck closed 6 years ago

RicardoUsbeck commented 7 years ago

I followed the "running with docker" instructions on the following wiki page: https://github.com/dice-group/AGDISTIS/wiki/3-Running-the-webservice

I downloaded and unpacked the following German DBPedia file: dbpedia_index_2016-04/de/indexdbpedia_de_2016.zip and renamed the folder to "index".

I then ran the following commands:

docker pull aksw/agdistis
docker run -d -p 8080:8080 -v `pwd`/index:/usr/local/tomcat/index aksw/agdistis
docker start <container_name>

When I query the webservice using the following command: curl --data-urlencode "text='<entity>Angela Merkel</entity>.'" -d type='agdistis' http://localhost:8080/AGDISTIS

I receive what look to be dummy URLs, suggesting the entity cannot be found in the index: [{"disambiguatedURL":"http:\/\/aksw.org\/notInWiki\/AngelaMerkel","offset":13,"namedEntity":"Angela Merkel","start":1}]

I tried a number of entities that I would expect to return a URL, all without success. I receive the same dummy URL results if I point the docker container at the 2014 German index. However, I can return results for English (using the 2016 and 2014 indices): [{"disambiguatedURL":"http:\/\/dbpedia.org\/resource\/Angela_Merkel","offset":13,"namedEntity":"Angela Merkel","start":1}]

Are you able to advise whether there is a problem/difference with the German index, or perhaps with my execution of the steps for German?

(assigned also to @mejohnee)

yamalight commented 7 years ago

@RicardoUsbeck will try to reproduce this today

yamalight commented 7 years ago

@RicardoUsbeck ok, could reproduce it easily. As far as I see, docker container works fine and loads the index. And since english index works just fine - maybe german the index is broken?

DiegoMoussallem commented 7 years ago

Hi, the point is that when you run docker pull aksw/agdistis, the container considers the pre-defined agdistis.properties which means when you change the KB, you have to change the agdistis.properties file as well. In the respective fields below: Pre-defined: nodeType=http://dbpedia.org/resource/ edgeType=http://dbpedia.org/ontology/ baseURI =http://dbpedia.org new KB: nodeType=http://de.dbpedia.org/resource/ edgeType=http://de.dbpedia.org/ontology/ baseURI =http://de.dbpedia.org

It will also happen in case you change drastically the KB i.e., URI, e.g.,Wikidata. We can figure out a solution to deal with it.

yamalight commented 7 years ago

@DiegoMoussallem can you change it so that those vars are read from environment variables? (and if those are not set - fall back to properties file)

DiegoMoussallem commented 7 years ago

@yamalight I intend to do it because it is also related to the challenge of dealing with more than one nodeType. But, I have to think how it will affect the performance of graph-based algorithms.

yamalight commented 7 years ago

@DiegoMoussallem why would that affect performance in any way? it's not that different to reading those values from file 🤔

DiegoMoussallem commented 7 years ago

@yamalight Because we boost those algorithms using the node and edge types. Also, there are other variables, like context, acronyms, threshold, commonEntities, endpoint, coreference which can vary significantly. I'm not sure how feasible it would be if we put everything as variables in terms of running. If you have any idea just tell me :)

DiegoMoussallem commented 7 years ago

@yamalight I think I got what you meant, was it to insert the values while it is running?

yamalight commented 7 years ago

@DiegoMoussallem yes, instead of reading from file just use e.g. System.getenv("AGDISTIS_NODE_TYPE")

DiegoMoussallem commented 7 years ago

I got that, sorry, I got confused with another term. Would it be better for running a docker?

yamalight commented 7 years ago

@DiegoMoussallem yep. then you can basically provide those vars on container start, which'd make configuration much simpler

DiegoMoussallem commented 7 years ago

Perfect, I'm gonna provide it, but first I'll think about the usage of more than node and edge types. I then improve everything.

DiegoMoussallem commented 7 years ago

@RicardoUsbeck Could you create your own docker image for a while? -> https://github.com/dice-group/AGDISTIS/wiki/3-Running-the-webservice#create-your-own-docker-image

RicardoUsbeck commented 7 years ago

It is not for me, but for a colleague. That means, we should first fix it for one node type and close this issue. Than open another issue for multiple node types.

DiegoMoussallem commented 7 years ago

@yamalight @RicardoUsbeck I was wondering if there is a better solution than reading from environment variables. Because the user will have to insert the values all the time when he runs a new instance because sometimes the user just wants to analyze some behaviors and not keep a given instance running forever.

yamalight commented 7 years ago

@DiegoMoussallem isn't it exactly the same with editing config?

RicardoUsbeck commented 7 years ago

The thing is, that the config file is compiled into the jar which we copy into the container. That is, we actually would need to have one docker image per language

DiegoMoussallem commented 7 years ago

@yamalight No, It is not. At least I don't think so because when a user wants to change one of the parameters using editing config, he just changes it and compiles again maintaining the other values. By environment variables, the user has to insert all values again.

lguillou commented 7 years ago

@DiegoMoussallem @RicardoUsbeck I originally reported the issue directly to Ricardo. At Ricardo's suggestion I followed the instructions on the Wiki to run my own webservice (i.e. running the Java directly), and Diego's suggested changes to the agdistis.properties file (nodeType, edgeType, baseURI) and was able to retrieve results for German. Initially I missed the need to set those three properties - my mistake. Thank you both very much for your continues assistance and patience with this issue.

yamalight commented 7 years ago

@DiegoMoussallem are you sure we're talking about the same env vars? if you use config - you need to recompile the whole thing. if you use env vars - you just need to restart the app, right? that's like 10x easier

edit: I am not seeing the issue you keep bringing up with using env vars. can you please explain your concerns in details?

RicardoUsbeck commented 7 years ago

@lguillou Cool, we hope you like what you see. We will update the webservices soon. @yamalight Can you please paste a command for running the docker with env vars so Diego sees that it is actually quite easy and we do not need 3 images but only one?

yamalight commented 7 years ago

@RicardoUsbeck @DiegoMoussallem sure. Here's how you would configure AGDISTIS via env vars:

  1. In docker:
    docker run -d \
    -e AGDISTIS_NODE_TYPE=http://de.dbpedia.org/resource/  \
    -e AGDISTIS_EDGE_TYPE=http://de.dbpedia.org/ontology/ \
    -e AGDISTIS_BASE_URI=http://de.dbpedia.org \
    -p 8080:8080 \
    -v `pwd`/index:/usr/local/tomcat/index \
    aksw/agdistis
  2. Locally:
    AGDISTIS_NODE_TYPE=http://de.dbpedia.org/resource/ \ 
    AGDISTIS_EDGE_TYPE=http://de.dbpedia.org/ontology/ \
    AGDISTIS_BASE_URI=http://de.dbpedia.org \
    mvn tomcat:run

No need to re-compile and change anything in the app.

DiegoMoussallem commented 7 years ago

@yamalight @RicardoUsbeck Guys, I know what you are saying and we are talking about the same thing for sure. I was just wondering if a person who does not use docker and is improving Agdistis if it would be good. There was not the necessity to past code here about env variables.

Today we have more than 10 "variables", that's why I said that I was wondering if there was a better solution, not because I don't know what env variables are.

index=indexdbpedia_en_2016
index2=index_bycontext_en
nodeType=http://dbpedia.org/resource/
edgeType=http://dbpedia.org/ontology/
baseURI =http://dbpedia.org
endpoint=https://query.wikidata.org/sparql
ngramDistance=3
maxDepth=2
threshholdTrigram=0.87
heuristicExpansionOn=true
whiteList=/config/whiteList.txt
corporationAffixes=/config/corporationAffixes.txt
popularity=true
algorithm=hits
context=true
acronym=true
commonEntities=false
folderWithTTLFiles=/Users/diegomoussallem/Desktop/AGDISTIS/data/2016-04/en/context/
surfaceFormTSV=/Users/diegomoussallem/Desktop/AGDISTIS/data/2016-04/en/surface/en_surface_forms.tsv 
yamalight commented 7 years ago

@DiegoMoussallem @RicardoUsbeck 1) 10 vars is really not that much. I've seen apps that have 30+ config options using env vars and it was ok, nobody died :) 2) it's also fine from UX perspective, they're not that hard to set. 3) if you want better UX - you can move config to an external file (e.g. yaml, toml, etc) that is re-read on changes

IMO ability to config using env vars should still be there, because using config files in docker is not too convenient (and can bring additional pains when deploying multiple AGDISTIS nodes to swarm)

RicardoUsbeck commented 7 years ago

Okay, so in summary:

Thus, I suggest to leave this issue open and open a new issue implementing properties as environment variables for both: running docker and running from mvn. Do you agree @DiegoMoussallem and @yamalight ?

yamalight commented 7 years ago

@RicardoUsbeck sounds good to me

yamalight commented 7 years ago

@RicardoUsbeck should I just do it? That'll probably take ~5 mins

DiegoMoussallem commented 7 years ago

@yamalight feel free.

yamalight commented 7 years ago

@RicardoUsbeck @DiegoMoussallem PR created. Please check the code/style/etc and merge if everything's good.

Running that branch with following config:

docker run -d \
  --name agdistis \
  -v `pwd`/indexdbpedia_de_2016:/usr/local/tomcat/index \
  -p 8080:8080 \
  -e AGDISTIS_NODE_TYPE=http://de.dbpedia.org/resource/ \
  -e AGDISTIS_EDGE_TYPE=http://de.dbpedia.org/ontology/ \
  -e AGDISTIS_BASE_URI=http://de.dbpedia.org \
  agdistis:env

And requesting query from @RicardoUsbeck with curl:

curl --data-urlencode "text='<entity>Angela Merkel</entity>.'" -d type='agdistis' http://localhost:8080/AGDISTIS

Now works as expected.

RicardoUsbeck commented 7 years ago

I will check in the upcoming days. Thanks so much!

RicardoUsbeck commented 6 years ago

I did a new clone and tested it for English

mvn package -Dmaven.test.skip=true
docker build -t aksw/agdistis .
wget http://hobbitdata.informatik.uni-leipzig.de/agdistis/dbpedia_index_2016-04/en/indexdbpedia_en_2016.zip
unzip indexdbpedia_en_2016.zip
docker run -d \
  --name agdistis \
  -v `pwd`/indexdbpedia_en_2016:/usr/local/tomcat/index \
  -p 8080:8080 \
  -e AGDISTIS_NODE_TYPE=http://dbpedia.org/resource/ \
  -e AGDISTIS_EDGE_TYPE=http://dbpedia.org/ontology/ \
  -e AGDISTIS_BASE_URI=http://dbpedia.org \
  aksw/agdistis:latest
curl --data-urlencode "text='<entity>Angela Merkel</entity>.'" -d type='agdistis' http://localhost:8080/AGDISTIS

and it worked. Then I tried another language to see the environment variables working:

docker stop agdistis
wget http://hobbitdata.informatik.uni-leipzig.de/agdistis/dbpedia_index_2016-04/de/indexdbpedia_de_2016.zip
unzip indexdbpedia_de_2016.zip
docker run -d \
  --name agdistis \
  -v `pwd`/indexdbpedia_de_2016:/usr/local/tomcat/index \
  -p 8080:8080 \
  -e AGDISTIS_NODE_TYPE=http://de.dbpedia.org/resource/ \
  -e AGDISTIS_EDGE_TYPE=http://de.dbpedia.org/ontology/ \
  -e AGDISTIS_BASE_URI=http://de.dbpedia.org \
  aksw/agdistis:latest
curl --data-urlencode "text='<entity>Angela Merkel</entity>.'" -d type='agdistis' http://localhost:8080/AGDISTIS