B3J4y / UniDisk

A Crawler to search for keywords and compare the score
MIT License
0 stars 1 forks source link
comparison crawler nlp solr-client

UniDisk

A Crawler to search for keywords and compare the score

Build Status

Installation

Prerequisites

Installation steps for primefaces

Docker

We use Docker to simplify development. Run docker compose up -d to create a development environment with a MySQL, Solr and Tomcat Server. The API is then available under localhost:8081/unidisk/rest. In order for code changes to take effect, the services need to be rebuilt (docker-compose up -d --build). This takes some time therefore it makes sense to instead kill the web service (docker rm -fv unidisk_api_1) and start the server via IntelliJ (or any other way). If the server is started via IntelliJ the API is accessible from _localhost:8080/unidiskwar/rest.

Admin Dashboards

Go to localhost:8082 to open Adminer (MySQL Dashboard) or http://localhost:8983 for the Solr dashboard. Enter the following crendetials for Adminer: Server: db Benutzer: user Passwort: secret Datenbank: unidisk

Authentication

We use Firebase for authentication purposes. Accounts must be created via the Firebase console or CLI. Self sign up is currently not possible.

Authentication Method

The authentication method can be changed by modifying the authentication property in unidisk.properties. If the specified value is not firebase all incoming requests are authenticated if any bearer token is set in the request header.

Database

The project contains configuration files for an in memory as well as a MySQL database. The configuration can be changed by replacing the content of the config variable in the HibernateUtil class.

Additional configurations can be added by placing a configuration file into the resource directory. Afterwards reference that file in the HibernateUtil class by changing the config variable to

public class HibernateUtil {
    private static final String someOtherConfig = "hibernate.foo.bar.cfg.xml";
    private static String config = someOtherConfig;
    ...
}

Mock setup

You can populate the database by changing the content of the TestSetupBean class. The init function stores predefined data from the ApplicationState in the currently configured database. This happens every time the server starts. It's therefore recommended to remove the content of the init function after the first start of the app (unless you use an in memory database).

In Memory Problems

If you are on pages that require a specific entity id (e.g. project page) and redeploy the server, the former id might not exist anymore and the page is unable to load.

MySQL Development Problems

org.hibernate.id.IdentifierGenerationException: could not read a hi value - you need to populate the table: hibernate_sequence

This error occurs if you truncate the hibernate_sequence table. Delete the database instance and restart the server to fix this problem.

Solr

Follow the installation guide from https://lucene.apache.org/solr/guide/7_0/index.html.

Setup

If solr isn't running execute solr start -p 8983. The further setup will use port 8983 as default. If you chose another one make sure to adapt the urls and commands.

Firebase

Ask one of the maintainers for the Firebase service account file and place it into crawler/src/main/resources as firebase-sa.json. This is only necessary if you want to use Firebase authentication.

Create Core

Run solr create -c unidisc. After the command finished you should see the message Created new core 'unidisc' in the console/terminal. You should now see the unidisc core section at http://localhost:8983/solr/#/unidisc/core-overview.

Verify Setup

The test case testFieldInputAndQuery in SimpleCrawlTest should now run successfully. You can also run shootTheMoon which crawls websites and posts the result to solr. The test doesn't terminate but the number of documents in the unidisc core should increase.