SDiFI / sdifi_rasa_akranes

An example project to demonstrate a Rasa chatbot handling a municipality service center use case. The example data is based around the well known municipality 'Andabær' and the chatbot is called 'Jóakim'.
1 stars 0 forks source link

Build

An example project to demonstrate a use case for a Rasa chatbot handling a municipality service center use case. The example data is based around the well known municipality 'Andabær' and the chatbot is called 'Jóakim'.

Setup/Installation

Running via docker-compose

This is the recommended way to run Rasa in a production environment. Clone the repo. Add a new file .env with the following contents set:

RASA_VERSION=3.6.2                        # Rasa version to use, you should also set an appropriate Rasa SDK version
                                          # when building the action_server via docker/sdk/Dockerfile
                                          # in case the major or minor number changes
RASA_TOKEN=<some_rasa_token>              # Access token for using the Restful API of Rasa
RABBITMQ_PASSWORD=<some_rabbitmq_passwd>  # Password to use for RabbitMQ
DB_PASSWORD=<some_database_passwd>        # PostgreSQL password
RASA_TELEMETRY_ENABLED=false              # Set to true in case you want to send anonymous usage data to Rasa
DEBUG_MODE=true                           # Set to false, if you don't want lots of information from Rasa
SQLALCHEMY_SILENCE_UBER_WARNING=1         # Set to 0, if you want to see SQLAlchemy warnings
FUSEKI_VERSION=4.8.0                      # set version of Fuseki, the RDF knowledge base used in rasa actions

You can also use the provided .env.template file as an example.

Currently, the docker-compose setup is only meant for running a Rasa instance with an already trained model from the directory ./models. To train a model, the local development method should be used before starting Rasa in this way (see further down). Rasa always uses the model with the latest timestamp in case multiple model files exist inside models/ You should also create a cache/ folder inside your repo and change permissions with:

chmod 777 cache/

After training, run the following commands: (docker-compose needs to be installed):

docker-compose build    # this builds the action_server, fuseki, etc.
# This (optional) step prepares the fuseki-data volume to import RDF data mounted as docker-volume. By default, the
# RDF database is already prepared with default contents of `src/municipal_info_api/offices_staff.rdf`. If you activate
# the commented-out Fuseki DB volume, use the following snippet to prepare the volume with dummy data. Please refer
# to [docker-compose.yml](docker-compose.yml) service definition if you want to use a different RDF file as
# initial DB
docker-compose run --rm --entrypoint="sh /fuseki/scripts/db_init.sh /fuseki/rdf/initial.rdf" fuseki
# This starts all services

docker-compose up

To test availability of the Fuseki server, try the following command that runs a query inside the fuseki container:

docker exec -it fuseki_server bin/rsparql --query ex.sparql --service=http://localhost:3030/ds/query

It should return something like:

--------------------
| role             |
====================
| "Skólamál"       |
| "Stjórnsýslumál" |
| "Launamál"       |
| "Velferðarmál"   |
--------------------

Replace the database

If you need to change or add information in the knowledge base, you can do so by editing the RDF-file (or creating a new one), deleting the Fuseki database and replacing it with your new or updated data. Assuming there is a folder rdf/ in the project root folder containing a new RDF-file updated_kb.rdf, the steps to replace the Fuseki database are as follows:

# Go into the Fuseki container and delete the database
docker exec -it fuseki_server rm -rf databases/DB2/Data-0001/
# Stop all containers so that we can perform the dataloading, then, similar as above, but with the /rdf directory mounted:
docker-compose run --rm  -v $(pwd)/rdf:/fuseki/rdf/ --entrypoint="sh /fuseki/scripts/db_init.sh /fuseki/rdf/updated_kb.rdf" fuseki
# Start the containers again

Trigger intent via endpoint

With all servers up and running, all intents can be triggered via the trigger_intent endpoint of your Rasa conversation. For instance, we include an intent for retrieving the 'MOTD' (Message-of-the-day) greeting, which are stored in the motd.yml file. Different bot greetings are produced for different given dates and languages.

This intent can be triggered, and English passed as the language entity, with a request like:

curl -X POST http://localhost:5005/conversations/conversation_id/trigger_intent?token=RASA_TOKEN -H 'application/json' -d '{"name":"motd", "entities": {"language": "en-EN"}}' 

where conversation\_id is the id of your current conversation, found in the Rasa server, and RASA_TOKEN is the one defined in your .env file.

Local development with Rasa

Currently, this approach is necessary to train a model, but is also similar to the setup stated inside the official Rasa documentation. However, for a production setup this approach should not be used.

Create a virtual environment (use Python 3.9, not 3.10) and install all dependencies via the following command:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

note:

Currently, there is a bug when running rasa interactive in combination with the package uvloop, therefore, you need to uninstall uvloop before running rasa interactive:

pip uninstall uvloop

Configuration files

Rasa needs certain configuration files to be present. For local development, these are:

./config.yml
./config/credentials.yml
./config/endpoints.yml

If Rasa should be able to train a model, the following files need to be present:

domain.yml
./data/nlu.yml
./data/rules.yml
./data/stories/stories.yml

Training

Train and test a model. The model file will be placed inside the directory ./models. As mentioned above, Rasa automatically uses the latest trained model inside directory models/ when started. You can also cross-validate your model. Running cross-validation gives you valuable info about your model and training data.

Optionally, you can disable sending of telemetry data home to Rasa.

rasa telemetry disable
rasa train --endpoints config/endpoints.yml --data data/ --config config.yml
rasa data validate --config config.yml --data data/
rasa test --endpoints config/endpoints.yml  --config config.yml -s tests/

To run cross validations:

rasa test --endpoints config/endpoints.yml  --config config.yml -s tests/ --cross-validation

This needs a long time, as there are multiple models trained and tested. All test and cross-validation results are placed into the subdirectory results/

Running Rasa

To start locally a standalone Rasa server, you need to start the action server in another terminal session as well. For actions, we need to add the directory src/municipal_info_api to the variable PYTHONPATH, and for querying the Fuseki database, we need to add 'localhost' to the variable FUSEKI_NETWORK_HOST:

export PYTHONPATH="`pwd`/src/municipal_info_api"
export FUSEKI_NETWORK_HOST="localhost"
python -m rasa_sdk --actions actions --port 5055

The Fuseki service needs to be up (and already built, see section on setup) and can be started on its own by running:

docker-compose up fuseki -d

Note that, when running a standalone Rasa server, in order to be able to update the RDF file used to initialise the database, the file fuseki_override_template.yml needs to be renamed docker-compose.override.yml and the following lines in the file docker-compose.yml need to be uncommented:

# Under the 'fuseki' service section, towards the end of the file:

#volumes:
#  - fuseki-data:/fuseki/databases/DB2

#depends_on:
#  fuseki-init:
#    condition: service_completed_successfully

Start the Rasa server:

rasa run -vv --credentials config/credentials.yml --endpoints config/endpoints.yml --port 8180 --cors "*" models/

Test actions / SPARQL queries

The following command runs tests for the SPARQL queries and Rasa actions:

# If not exported already, see 'Running Rasa' section:
export PYTHONPATH="`pwd`/src/municipal_info_api"
export FUSEKI_NETWORK_HOST="localhost"

pytest .

The SPARQL query tests are located here and action tests here.

Integration with Masdif

The Rasa server can be integrated with Masdif. The provided docker-compose snippet masdif_override_template.yml can be used to start the Rasa server and Masdif together. Please refer to the documentation of Masdif for further details.

Talk to Rasa via web widget

You can use the web-widget for both approaches: starting Rasa via docker-compose up or if you prefer using the Python virtual environment. Open the file ./webchat/index.html with a web browser. It will automatically connect to the running Rasa server at localhost:8180.

In case you start Rasa via docker-compose up, you can simply navigate with your browser to URL http://localhost:8180 and the web widget is served directly from Nginx.

Implemented intents

Currently, the system can identify the following intents:

The intent 'motd' is also included, but has no training data as it is not meant to be triggered through a message from the user (see 'Trigger intent via endpoint').

BÍN Entity Extraction

We have implemented an experimental entity extractor as a Rasa custom component that uses the BÍN database to extract entities from the user input. This extractor is currently disabled by default, but can be enabled by placing the configuration snippet bin_config.yml below the DIETClassifier configuration into the file config.yml`.

The idea is to use the BÍN database to extract entities that are not covered by the DIETClassifier, such as names of people, places, etc. The BÍN database is a large database of Icelandic words, their inflections and their grammatical properties. The BÍN entity extractor can be customized to use mappings of entities to certain BÍN properties as described here. One should also carefully place stop-words into the stop_words configuration parameter to avoid extracting entities that are not relevant or that are ambiguous as e.g. á, dag, ...

It should be placed behind all other entity extractors in the Rasa pipeline and be configured to either ignore or replace already extracted entities, otherwise it will append the BÍN entities to the already extracted entities which leads to double extracted entities. It can also be configured via the parameter match_training_data to extract only entities that are labelled and where the user text closely follows the training data.

Please refer to the BÍN entity extractor configuration snippet for further details.

Knowledge base

The knowledge base is initialised via an RDF document (src/municipal_info_api/offices_staff.rdf) and uses the following ontologies:

As well as RDF, RDFS and SKOS, along with some custom defined entities.

Andabær is defined as an Organization, having OrganizationalUnits connected to it as the different divisions (building and construction matters, education, welfare, etc.). Each of the OrganizationalUnits has members which are defined as FOAF-entities with available contact information.

All names inside the knowledge base are invented and shall not refer to real persons.