This project is a part of the Jewish Bookshelf ecosystem in the Technion Data and Knowledge Lab.
It integrates several COTS projects in order to create a search engine for JBS data. The data is pulled from jbs-text repository, which contains JSON objects that describe the JBS data.
jbs-text
repository to documents for Solrbin/solr start
command. This will start the Solr server in the background listening for requestes on port 8983bin/solr status
command, to check Solr started correctlybin/solr create -c <core-name>
command (for example: bin/solr create -c jbs-ir
)Solr supports multiple cores under one Solr instance.
Each core can be addressed by adding the the core name to Solr URL: http://<machine-name>:<Solr-port>/solr/<core-name>
.
For example: http://tdk2.cs.technion.ac.il:8983/solr/jbs-ir.
Before modifying Solr core files, we also advise you to read Documents, Fields, and Schema Design.
In this section you will replace two default core files with ones that we modified. To understand what changes have been applied to the files, or to learn how to index additional fields in your documents - visit this Wiki page.
git clone https://github.com/TechnionTDK/jbs-ir.git
server/solr/<core-name>/conf
directory replace the managed-schema
file with the one in the repository under Solr configuration files
directoryserver/solr/<core-name>/conf
directory replace the solrconfig.xml
file with the one in the repository under Solr configuration files
directorybin/solr restart
command from the solr directoryWe encourage you to read this wiki to have better understanding of the changes you are going to perform: Understanding Analyzers, Tokenizers, and Filters.
We chose to use the HebMorph hewbrew analyzer: Hebmorph github repository.
To integrate HebMorph into your Solr core follow SOLR-README.md in Hebmorph github repository. We provide here practical integration guidelines that should suffice:
managed-schema
and solrconfig.xml
as explained in SOLR-README.md
, because we prepared them in advance and they were copied in the last sectionsolrconfig.xml
from this repository, please place the HebMorph .jar file under server
directory (in SOLR-README.md
, note that instanceDir
refers to server
directory in our case).bin/solr restart
command from the solr directory.Indexing is done according to managed-schema
file we discussed before.
In order to index the relevant documents with Solr, please follow the next steps.
git clone https://github.com/TechnionTDK/jbs-ir.git
(use git pull if you cloned the repository before)git clone https://github.com/TechnionTDK/jbs-text.git
(use git pull if you cloned the repository before)jbs-ir
directory and create .jar for the JsonParser in jbs-ir using mvn package
command
JsonParser-1.0-jar-with-dependencies.jar
will be located in jbs-ir/JsonParser/target/
cp jbs-ir/JsonParser/target/JsonParser-1.0-jar-with-dependencies.jar .
java -jar JsonParser-1.0-jar-with-dependencies.jar <path-to-desired-data-directory> <path-to-output-documents-directory>
<path-to-desired-data-directory>
(recursively) and proccesses them into multiple .json files, where each .json file contains one JSON object. These new .json files are placed into <path-to-output-documents-directory>
.bin/solr restart
command from the solr directorybin/post -c <core-name> <path-to-output-documents-directory>
You can use the Solr Admin UI for running queries, analysis and viewing core details. Please visit Overview of the Solr Admin UI for more information.
You can access the Admin UI at: http://<machine-name>:<Solr-port>
.
To access a specific core: http://<machine-name>:<Solr-port>/#/<core-name>
.
You can read about the most useful features we found in the Admin UI in Useful Admin UI Features.
There is a basic UI for searching which you can access at: http://<machine-name>:<Solr-port>/solr/<core-name>/browse
.
We wanted to make some adjusments to that UI so it will present the data in a more friendly way.
Solr uses Velocity for their web UIs, so we worked on top of the example files under <solr-home-dir>/example/files/conf/velocity
.
For more information about the files we changed to configure the UI read the following wiki page: (Useful Velocity files)[https://github.com/TechnionTDK/jbs-ir/wiki/Useful-Velocity-files]
To read more about Velocity, go to The Apache Velocity Project.
velocity
under Solr configuration files
in this repository to server/solr/<core-name>/conf
directory on your machinebrowse-resources
under Solr configuration files
to server
directorybin/solr restart
from Solr home directory for the changes to applyhttp://<machine-name>:<Solr-port>/solr/<core-name>/browse
We included an evaluation tool for the Solr search engine. The tool allows the user to automate the evaluation of the engine and extract any required data by:
In order to use the tool (after cloning this repository) take a look at execute()
method in JbsIrTestTool.java
class, this method demonstrates how the tool can be used.
Note: that the tool expects to receive the URL to the core of your Solr engine as an argument
Note: because the terminal (Linux) and Command Prompt (Windows) don't support Hebrew text, it's better to run the application from a work envoirment such as IntelliJ IDEA
Before running the Main method in the IDE, you have to configure Program Arguments
in Run/Debug configurations to contain your Solr core address in this format: http://<machine-name>:<Solr-port>/solr/<core-name>
.
In case you changed the code and want to create a .jar file after running the mvn package
command, do the following:
jbs-ir/evaluation/target/evaluation-1.0-jar-with-dependencies.jar
java -jar evaluation-1.0-jar-with-dependencies.jar http://<machine-name>:<Solr-port>/solr/<core-name>