Open MarcusSorealheis opened 4 years ago
Hi Marcus,
I'm very glad that you want to add support for Solr, and I'm ready to support you in doing so.
Luckily, there is a Solr connector for Spark that you could use off-the-shelf. To allow Squerall to connect to it, just add a case
for it in SparkExecutor.scala here [1] with Solr connector code [2].
Like so:
case "solr" => df = spark.read.format("solr").options(options).load
Then in config file [3], add a JSON object to specify Solr options, for example:
{
"type": "solr",
"options": {
"collection": "abc",
"zkhost": "xyz"
},
"source": "//Entity",
"entity": "Entity"
}
Then in mappings file [4], map that Solr entity to ontology class and properties, for example:
<#EntityMapping>
rml:logicalSource [
rml:source "//Entity";
nosql:store nosql:solr
];
rr:subjectMap [
rr:template "http://example.com/{nr}";
rr:class bsbm:Producer
];
rr:predicateObjectMap [
rr:predicate edm:country;
rr:objectMap [rml:reference "country"]
];
Note the //Entity
should be the same between config and mappings file.
Please, try and let me know. If that doesn't work for you, share with me your files I look with you.
Note: Clone the develop
branch instead of master
, as that has some advanced logging messages. The command is very simple [5].
[1] https://github.com/EIS-Bonn/Squerall/blob/master/src/main/scala/org/squerall/SparkExecutor.scala#L78 [2] https://github.com/LucidWorks/spark-solr#via-dataframe. [3] https://github.com/EIS-Bonn/Squerall/blob/master/evaluation/input_files/config [4] https://github.com/EIS-Bonn/Squerall/blob/master/evaluation/input_files/mappings.ttl [5] https://stackoverflow.com/a/1911126/1730115
Hello there,
I like this project and have read about it in a few papers. Could you kindly share any tips you might have around querying Solr?
I will work on it, but asking here in case this has been discussed or my efforts can be streamlined in any way.