Closed Kefaun2601 closed 3 years ago
Hi @Kefaun2601 ! This is a great idea.
Currently, solr is the default, and its settings are read from conf file: https://github.com/USCDataScience/sparkler/blob/53f54746eb00d35a3f93fac0d3b8dbaa895d5755/sparkler-core/conf/sparkler-default.yaml#L18-L29
I suggest modifying the config to support elastic search
crawldb.backend: elasticsearch # "solr" is default until elastic becomes usable.
# add any settings necessary for elasticsearch
# if there are too many, create `elasticsearch:` block of config
elasticsearch:
uri: xyz
arg1: val1
arg2: val2
I prefer adding this inside the config instead of CLI, because of several reasons.
If you feel the config is friction, and lets consider minimize it by creating startup scripts to automate it.
@thammegowda This is of great help! Our capstone team members will look into setting up Elasticsearch according to your advice above. Many thanks!
We are setting up an Elasticsearch backend for Sparkler. This will serve as another pipeline for data persistence parallel to the existing Apache Solr connector.
Sparkler Committers, do you have any advice on how we should configure the command line options to allow the user to specify Solr or Elasticsearch on startup? Ideally, we would like to minimize friction with the already existing framework.
Thanks in advance!