EIS-Bonn / Squerall

An implementation of the so-called Semantic Data Lake, using Apache Spark and Presto.
https://eis-bonn.github.io/Squerall/
Apache License 2.0
30 stars 12 forks source link

Solr #5

Open MarcusSorealheis opened 4 years ago

MarcusSorealheis commented 4 years ago

Hello there,

I like this project and have read about it in a few papers. Could you kindly share any tips you might have around querying Solr?

I will work on it, but asking here in case this has been discussed or my efforts can be streamlined in any way.

mnmami commented 4 years ago

Hi Marcus,

I'm very glad that you want to add support for Solr, and I'm ready to support you in doing so.

Luckily, there is a Solr connector for Spark that you could use off-the-shelf. To allow Squerall to connect to it, just add a case for it in SparkExecutor.scala here [1] with Solr connector code [2].

Like so:

case "solr" => df = spark.read.format("solr").options(options).load

Then in config file [3], add a JSON object to specify Solr options, for example:

       {
        "type": "solr",
        "options": {
            "collection": "abc",
            "zkhost": "xyz"
        },
        "source": "//Entity",
        "entity": "Entity"
    }

Then in mappings file [4], map that Solr entity to ontology class and properties, for example:

<#EntityMapping>
    rml:logicalSource [
        rml:source "//Entity";
        nosql:store nosql:solr
    ];
    rr:subjectMap [
        rr:template "http://example.com/{nr}";
        rr:class bsbm:Producer
    ];

    rr:predicateObjectMap [
        rr:predicate edm:country;
        rr:objectMap [rml:reference "country"]
    ];

Note the //Entity should be the same between config and mappings file.

Please, try and let me know. If that doesn't work for you, share with me your files I look with you.

Note: Clone the develop branch instead of master, as that has some advanced logging messages. The command is very simple [5].

[1] https://github.com/EIS-Bonn/Squerall/blob/master/src/main/scala/org/squerall/SparkExecutor.scala#L78 [2] https://github.com/LucidWorks/spark-solr#via-dataframe. [3] https://github.com/EIS-Bonn/Squerall/blob/master/evaluation/input_files/config [4] https://github.com/EIS-Bonn/Squerall/blob/master/evaluation/input_files/mappings.ttl [5] https://stackoverflow.com/a/1911126/1730115