brmson / yodaqa

A Question Answering system built on top of the Apache UIMA framework.
http://ailao.eu/yodaqa
Other
619 stars 205 forks source link

Trivial, but tedious: Swappable URLs? #46

Closed k0105 closed 8 years ago

k0105 commented 8 years ago

Hi Petr,

so I'm currently tidying up my contributions to give them to you (forgot one of my HDDs, so I have to rebuild a couple of things). One ugly aspect bugging me are the hardcoded URLs in cz.brmlab.yodaqa.provider.rdf.FreebaseLookup, cz.brmlab.yodaqa.provider.rdf.DBpediaTitles.java cz.brmlab.yodaqa.provider.rdf.DBpediaLookup and cz.brmlab.yodaqa.pipeline.YodaQA (enwiki/Solr). Since you essentially start different heads from gradlew (Web, GS, Interactive), it is difficult to dynamically swap out URLs.

What I would like to do is define them in the conf subdirectory either via JSON like this:

{
  "default": {
    "dbpedia":  "http://dbpedia.ailao.eu:3030/dbpedia/query",
    "freebase": "http://freebase.ailao.eu:3030/freebase/query",
    "label1":   "http://dbp-labels.ailao.eu:5000",
    "label2":   "http://dbp-labels.ailao.eu:5001",
    "solr":     "http://enwiki.ailao.eu:8983/solr/"
  },
  "offline": {
    "dbpedia":  "http://localhost:3037/dbpedia/query",
    "freebase": "http://127.0.0.1:3030/freebase/query",
    "label1":   "http://127.0.0.1:5000",
    "label2":   "http://127.0.0.1:5001",
    "solr":     "http://127.0.0.1:8983/solr"
  },
  "server": {
    "dbpedia":  "http://sth.somecompany.com:3037/dbpedia/query",
    "freebase": "http://sth.somecompany.com:3030/freebase/query",
    "label1":   "http://sth.somecompany.com:5000",
    "label2":   "http://sth.somecompany.com:5001",
    "solr":     "http://sth.somecompany.com:8983/solr"
  }
}

or via XML like this:

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<configuration>
  <default>
    <dbpedia>http://dbpedia.ailao.eu:3030/dbpedia/query</dbpedia>
    <freebase>http://freebase.ailao.eu:3030/freebase/query</freebase>
    <label1>http://dbp-labels.ailao.eu:5000</label1>
    <label2>http://dbp-labels.ailao.eu:5001</label2>
    <solr>http://enwiki.ailao.eu:8983/solr/</solr>
  </default>

  <offline>
    <dbpedia>http://127.0.0.1:3037/dbpedia/query</dbpedia>
    <freebase>http://127.0.0.1:3030/freebase/query</freebase>
    <label1>http://127.0.0.1:5000</label1>
    <label2>http://127.0.0.1:5001</label2>
    <solr>http://127.0.0.1:8983/solr/</solr>
  </offline>

  <server>
    <dbpedia>http://sth.somecompany.com:3037/dbpedia/query</dbpedia>
    <freebase>http://sth.somecompany.com:3030/freebase/query</freebase>
    <label1>http://sth.somecompany.com:5000</label1>
    <label2>http://sth.somecompany.com:5001</label2>
    <solr>http://sth.somecompany.com:8983/solr/</solr>
  </server>
</configuration>

and then use a solution like http://commons.apache.org/configuration/ to read the URLs + ports, so I can easily switch them with a signal. That might also be important for scaling up, e.g. by launching hundreds of Docker containers.

But since there is no common starting point everything has to pass, it is difficult to do that elegantly, even though it is a rather trivial matter. Is there any preferred way to implement this on your side?

I'm currently thinking of just supporting the REST interface, i.e. cz.brmlab.yodaqa.io.web.WebInterface, which means I'd add one function to swap URLs, which then triggers the state of a class, which reads the configuration and either returns the offline, ailao or company URLs when it is queries for them by the lookup classes. Does that make sense?

Best wishes, Joe

pasky commented 8 years ago

Hi! Looking into this makes total sense, thanks for bringing it up.

My first intuition would be to use plain Java properties for this, akin to what we do for bingapi (see git grep bingapi). Then, the property names could be

default.dbpedia=something.ailao.eu:1234

and you could pass -Dcz.brmlab.yodaqa.res_mode=offline, or some such on the java/gradle commandline to pick a particular family of URLs instead of the defaults.

(Maybe there's a way to bring the conf/ properties to the system property namespace; that'd be even nicer, you could just temporarily change one of them via the commandline.)

But I'm no Java guru, and maybe your method has advantages I don't see. Feel free to pick whatever makes the most sense; but given a choice, please JSON rather than XML. ;-)

k0105 commented 8 years ago

Thanks for answering so quickly.

I'm still undecided, but my initial idea would look like this: Extend WebInterface with a switch method.

// Switch data backend URLs, e.g. http://127.0.0.1:4567/switchBackend?id=2
        get(new Route("/switchBackend") {
            @Override
            public Object handle(Request request, Response response) {
                UrlManager urlManager = new UrlManager();

                response.type("application/json");
                response.header("Access-Control-Allow-Origin", "*");

                if (request.queryParams("id")!=null && urlManager.getConfigurationRaw()!=null &&
                        urlManager.getUrlLookUpTable() != null) {
                    try {
                        int id = Integer.parseInt(request.queryParams("id"));

                        //Debug Output
                        //Should be Arrays.stream(stringArray).collect(Collectors.joining(" ")); but ailao still requires Java 7
                        //Hence:
                        StringBuilder stringBuilder = new StringBuilder();
                        for (String currentString : urlManager.lookUpUrlSet(id)) {
                            if (stringBuilder.length() > 0) {
                                stringBuilder.append(" ");
                            }
                            stringBuilder.append(currentString);
                        }
                        return stringBuilder.toString();
                    } catch (NumberFormatException nfe){}
                }

                return "{}";

            }
        });

Then have a UrlManager that reads and parses a JSON array:

package cz.brmlab.yodaqa.provider;

import com.google.gson.*;
import com.google.gson.reflect.TypeToken;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.lang.reflect.Type;
import java.util.Arrays;
import java.util.List;

public class UrlManager {

    public UrlManager() {
        urlLookUpTable = parseConfiguration(readConfigurationFile());
    }

    String configurationRaw = null;
    public String getConfigurationRaw() {
        return configurationRaw;
    }
    public void setConfigurationRaw(String configurationRaw) {
        this.configurationRaw = configurationRaw;
    }

    String[][] urlLookUpTable = null;
    public String[][] getUrlLookUpTable() {
        return urlLookUpTable;
    }
    public void setUrlLookUpTable(String[][] urlLookUpTable) {
        this.urlLookUpTable = urlLookUpTable;
    }

    public String readConfigurationFile() {
        StringBuilder configurationJson = new StringBuilder();

        try (BufferedReader bufferedReader = new BufferedReader(new FileReader("conf/backendURLs.json")))
        {
            String currentLine;

            while ((currentLine = bufferedReader.readLine()) != null) {
                configurationJson.append(currentLine);
            }

        } catch (IOException e) {
            e.printStackTrace();
        }

        configurationRaw = configurationJson.toString();
        return configurationRaw;
    }

    public String[][] parseConfiguration(String jsonString) {
        // Expected order is alphabetical:
        // DBpedia, Freebase, label service 1, label service 2, enwiki/Solr

        Gson gson = new GsonBuilder().create();
        JsonParser  jsonParser  = new JsonParser();
        //JsonElement jsonElement = jsonParser.parse(jsonString);

        // Could get it as list or proceed manually:
        // List<String> backends = gson.fromJson(jsonObject.get("offline"), new TypeToken<List<String>>(){}.getType());
        // JsonArray  jsonArray = jsonElement.isJsonArray()?jsonElement.getAsJsonArray():null;

        String[][] backendUrls = gson.fromJson(jsonString, String[][].class);

        //Debug Output
        for (int i=0; i<backendUrls.length; i++){
            for(int j=0; j<backendUrls[i].length; j++) {
                System.out.println(backendUrls[i][j]);
            }
        }

        return backendUrls;
    }

    public String[] lookUpUrlSet(int id) {
        if(urlLookUpTable != null && id<urlLookUpTable.length) {
            return urlLookUpTable[id];
        }
        return null;
    }

}

Both assuming a simplified JSON format like this:

[
  [
    "http://dbpedia.ailao.eu:3030/dbpedia/query",
    "http://freebase.ailao.eu:3030/freebase/query",
    "http://dbp-labels.ailao.eu:5000",
    "http://dbp-labels.ailao.eu:5001",
    "http://enwiki.ailao.eu:8983/solr/"
  ],
  [
    "http://127.0.0.1:3037/dbpedia/query",
    "http://127.0.0.1:3030/freebase/query",
    "http://127.0.0.1:5000",
    "http://127.0.0.1:5001",
    "http://127.0.0.1:8983/solr"
  ],
  [
    "http://sth.somecompany.com:3037/dbpedia/query",
    "http://sth.somecompany.com:3030/freebase/query",
    "http://sth.somecompany.com:5000",
    "http://sth.somecompany.com:5001",
    "http://sth.somecompany.com:8983/solr"
  ]
]

Problem would be: How to distribute the information? If links to the UrlManager are given to every data source class, we have a problem, since some (DBpedia I think) are heavily inherited from, so you'd have to hand the reference to ten instantiations. Dependency injection is cool and possible in Java SE, but much work for such a small thing. And that's where I was stuck and opened the thread. Your Java properties idea might make sense, I'll look into it.

k0105 commented 8 years ago

OK, so I've taken a look. While I'm one of those guys who would write everything into verbose XML configurations that can be validated and edited via generated GUIs, this is fortunately not my project. For my purposes properties should work fine. What still worries me is that with cz.brmlab.yodaqa.res_mode, we'd have to start reading URLs from files in different packages of Yoda, which means code duplication and IO overhead. Also, this reading from file might get triggered several times, since classes like DBpediaTitles might get created several times and have multiple constructors.

Thus, I'm inclined to just have five properties for the backends. If null, the corresponding ailao option is used and if unequal null the specified URL is used. A bit verbose, but probably better than file access in x places for now. [Maybe Yoda should get one start routine instead of several main methods selected by Gradle at some point? Using the UrlManager from above would then make much sense, since it can load the URLs once and keep them in memory for easy and fast access.]

So I'd just say String i = System.getProperty("cz.brmlab.yodaqa.dbpediaurl") and if i==null we use ailao, otherwise i in cz.brmlab.yodaqa.provider.rdf.FreebaseLookup, cz.brmlab.yodaqa.provider.rdf.DBpediaTitles.java cz.brmlab.yodaqa.provider.rdf.DBpediaLookup and cz.brmlab.yodaqa.pipeline.YodaQA. Does that make sense? Should I prepare a pull request for that?

pasky commented 8 years ago

:-)

I think you could just refer to a static singleton of UrlManager from everywhere and not sure what the hurdles would be then? We do this e.g. with Wordnet, AnswerIDGenerator or QuestionDashboard.

But your last proposal is totally fine!

k0105 commented 8 years ago

I think the functionality is done. I combined both - the UrlManager is now static and can manage backends in a RESTful way, but properties can override any URL. However, since Yoda's Lookup objects are instantiated only once and thus request the information only once, they remain stuck on the initial choice. I could change that, but then I'd have to modify tons of objects. This means, you can currently change backends dynamically in between executions via properties, you can change them permanently by changing one value in the UrlManager, so you can just replace this file without touching any other part of Yoda and finally there is the (commented) REST interface that can be used to change URLs during execution if someone finds the time to change the fact that components only ask for the information once, namely when they are created. Is that good? If so, I'll send a pull request later today.

pasky commented 8 years ago

Well, personally I have no usecases for on-the-fly rather than startup-time switching, and I'm not sure how many usecases for the former there really are, so I'd be perfectly happy with a solution that does the latter.

k0105 commented 8 years ago

See pull request https://github.com/brmson/yodaqa/pull/47

k0105 commented 8 years ago

Pull request resubmitted.