KonradHoeffner / rickview

quick RDF viewer
MIT License
27 stars 3 forks source link

RickView

Latest Version Lint and Build Unsafe forbidden RickView @ Leipzig Semantic Web Day 2023 Video DOI

Easy to deploy low-resource stand-alone RDF knowledge graph browser written in Rust. No SPARQL endpoint needed! See also the unpublished paper draft. Layout copied from LodView.

Current deployments

Feel free to browse around!

Docker

Try it out with the example knowledge base:

docker run --rm -p 8080:8080 ghcr.io/konradhoeffner/rickview

Rootless Docker may have DNS issues when loading an ontology e.g. from GitHub, in those rare cases you can use the larger ghcr.io/konradhoeffner/rickview:glibc image.

Docker Compose Example

services:
  rickview:
    image: ghcr.io/konradhoeffner/rickview
    environment:
      - RICKVIEW_KB_FILE=https://raw.githubusercontent.com/hitontology/ontology/dist/all.ttl
      - RICKVIEW_NAMESPACE=http://hitontology.eu/ontology/
      - RICKVIEW_BASE=/ontology
      - RICKVIEW_TITLE=HITO
      - RICKVIEW_SUBTITLE=Health IT Ontology
      - RICKVIEW_EXAMPLES=Study SoftwareProduct ApplicationSystemTypeCatalogue
      - RICKVIEW_HOMEPAGE=https://hitontology.eu
      - RICKVIEW_ENDPOINT=https://hitontology.eu/sparql
      - RICKVIEW_GITHUB=https://github.com/hitontology/ontology
      - RICKVIEW_DOC=https://hitontology.github.io/ontology/
    ports:
      - "127.0.0.1:8080:8080"
    restart: unless-stopped

Precompiled Binaries

Download the binary from the latest release and run rickview. If you need binaries for a different platform than Linux amd64, let me know.

Compile it yourself

Alternatively, you can compile it for your own platform with cargo install rickview. Or you can clone the repository and then cargo build. This requires you to install Rust including Cargo.

Configure

Default configuration is stored in data/default.toml, which you can override with a custom data/config.toml or environment variables. Configuration keys are in lower_snake_case, while environment variables are prefixed with RICKVIEW_ and are in SCREAMING_SNAKE_CASE. For example, namespace = "http://hitontology.eu/ontology/" in config.toml is equivalent to RICKVIEW_NAMESPACE=http://hitontology.eu/ontology/ as an environment variable. You need to provide a knowledge base in RDF Turtle or HDT format, whose default path is data/kb.ttl. If you don't, RickView will show a minimal example knowledge base. You can add custom HTML to the index page by adding a data/body.html file. You can add embedded CSS using the css environment variable. By default, the Roboto font is used which RickView hosts locally for robustness, speed and to prevent conflicts with European privacy laws. If this is not an issue for you and, for example, you want to display Chinese or Japanese characters, you could import a Google Font:

css = "@import url('https://fonts.googleapis.com/css2?family=Noto+Sans+SC:wght@300&display=swap'); body {font-family: 'Noto Sans SC', sans-serif}"

Compile and run with cargo run and then open http://localhost:8080 in your browser.

Supported File Formats

The recognized formats and extensions are Turtle (.ttl), N-Triples (.nt), HDT (.hdt) as created by hdt-cpp and zstd compressed HDT (.hdt.zst).

Logging

The default log level is "info" for RickView and "error" for libraries. Change the log level of RickView with the log_level configuration key or the RICKVIEW_LOG_LEVEL environment variable. Override this setting using the RUST_LOG env var to configure the log levels of dependencies, see the env_logger documentation, for example:

RUST_LOG=rickview=debug cargo run

Motivation

Existing RDF browsers like LodView look great but use too much hardware ressources as they are based on interpreted or garbage collected languages. This leads to long wait times and out of memory errors on typical small scale research VMs with dozens of docker containers for longtime archival of finished research projects, whose results should still be available to enable reproducible science.

Goals

Implement a basic RDF browser similar to LodView in Rust with the following properties:

Stats

All values are rounded and were measured on an old RickView version on an Intel i9-12900k (16 cores, 24 threads) with 32 GB of DDR5-5200 RAM and a Samsung SSD 980 Pro 1 TB on Arch Linux, standard kernel 5.18. The qbench2 test URI is http://www.linkedspending.aksw.org/instance/618ac3ec98384f44a9ef142356ce476d. Stats for HDT, which uses much less RAM, are not measured yet.

Throughput Single Resource, HTML

There is no page cache but there could still be internal caching benefits so this should be more elaborate in the future.

$ wrk -t 24 -c 24 -d 30 http://localhost:8080/SoftwareProduct -H "Accept: text/html"
Running 30s test @ http://localhost:8080/SoftwareProduct
  24 threads and 24 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     9.79ms    3.25ms  26.92ms   56.40%
    Req/Sec   102.36     36.17   212.00     66.74%
  73590 requests in 30.02s, 1.04GB read
Requests/sec:   2451.31
Transfer/sec:     35.43MB

Throughput Single Resource, RDF Turtle

$ docker run --network=host -v $PWD/ontology/dist/hito.ttl:/app/data/kb.ttl  rickview
$ wrk -t 24 -c 24 -d 30 http://localhost:8080/SoftwareProduct
Running 30s test @ http://localhost:8080/SoftwareProduct
  24 threads and 24 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    13.96ms    4.74ms  37.20ms   55.04%
    Req/Sec    71.77     26.17   121.00     66.43%
  51572 requests in 30.02s, 567.72MB read
Requests/sec:   1717.72
Transfer/sec:     18.91MB

Stats of LodView

For comparison, here are the stats for the LodView RDF browser, written in Java and Swing.

Throughput Single Resource

As data is loaded after page load via JavaScript, real world performance may be worse.

$ wrk -t 24 -c 24 -d 30 http://localhost:8104/ontology/SoftwareProduct
Running 30s test @ http://localhost:8104/ontology/SoftwareProduct
  24 threads and 24 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.97ms    2.40ms  44.84ms   88.08%
    Req/Sec   713.62    353.46     1.24k    38.76%
511567 requests in 30.03s, 1.61GB read
  Socket errors: connect 0, read 1, write 0, timeout 0
  Non-2xx or 3xx responses: 511567
Requests/sec:  17037.90
Transfer/sec:     55.07MB

LodView was not able to serve 24 threads and 24 connections, so try it with only 1 thread and 1 connection:

$ wrk -t 1 -c 1 -d 30 http://localhost:8104/ontology/SoftwareProduct
Running 30s test @ http://localhost:8104/ontology/SoftwareProduct
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.90ms   13.30ms 250.66ms   97.34%
    Req/Sec   715.41    251.08     1.48k    69.46%
  21227 requests in 30.01s, 68.61MB read
  Non-2xx or 3xx responses: 21227
Requests/sec:    707.24
Transfer/sec:      2.29MB

Even a single thread and a single connection cause the container to report errors, this will be investigated in the future.

FAQ

Why is RickView necessary? Performance doesn't matter and RAM costs almost nothing!

According to Hitzler 2021, mainstream adoption of the Semantic Web field has stagnated due to a lack of freely available performant, accessible, robust and adaptable tools. Instead, limited duration research grants motivate the proliferation of countless research prototypes, which are not optimized for any of those criteria, are not maintained after the project ends and finally compete for resources on crowded servers if they do not break down completely.

Can you implement feature X?

I am very interested in hearing from you using it for your knowledge bases and am happy to assist you setting it up. Feature and pull requests are welcome, however the goal of RickView is to stay minimalistic and not serve every use case. Please also consider filling out the survey so I can see which features are most requested.

Why no .env support?

I think this would be overkill, as there is already a default configuration file, a custom configuration file, environment variables and Docker Compose supports .env out of the box as well. So my assumption is that you use the configuration file for local development and .env with Docker Compose. However if you need .env support outside of Docker Compose, just create an issue with a motivation and I may implement it.

How can I use it with large knowledge bases?

  1. Convert your data to the default HDT format using hdt-cpp.
  2. Deactivate the title and type indexes by setting large = true in data/config.toml or setting the environment variable RICKVIEW_LARGE=true.

Without the indexes, RickView's memory usage is only a few MB above the underlying HDT Sophia adapter in-memory graph, see benchmarks. For example, RickView on http://linkedspending.aksw.org/ uses ~ 2.6 GB RAM and contains LinkedSpending 2015, which is 30 GB as uncompressed N-Triples and 413 MB as zstd compressed HDT.

When to use compression and why not support other compression formats?

HDT is a compressed binary format that still supports fast querying. It can be further compressed but then RickView needs to uncompress it before loading, which in a test with a large knowledge base increased loading time from ~15s to ~17s. Because decompression is done in streaming mode, this restricts the available compressors and may even result in faster loading if you use a slow drive such as an HDD and a fast CPU. zstd was chosen because it compresses and decompresses quickly with a high ratio, supports streaming, and adds little overhead to the RickView binary. Brotli compresses extremely slowly on high compression settings while GZip results in much larger file sizes. If you need support for another streaming compressor, please create an issue.

Why does it look exactly like LodView?

  1. LodView looks beautiful and works well, the only problems are performance and to a lesser degree simple containerized deployment.
  2. LodView is licensed under MIT, so that is allowed. LodView is Copyright (c) 2014 Diego Valerio Camarda and Alessandro Antonuccio.
  3. I can focus my limited time on programming instead of design decisions. Other designs may follow later.
  4. Users of LodView can switch without training.
  5. Performance comparisons are easier when the interface is very similar.

Community Guidelines

Issues and Support

If you have a problem with the software, want to report a bug or have a feature request, please use the issue tracker. If have a different type of request, feel free to send an email to Konrad.

Citation

DOI

There is no publication about RickView yet, so please cite our Zenodo archive for now.

BibTeX entry

@software{rickview,
  author       = {Konrad H{\'o}ffner},
  title        = {{R}ick{V}iew: {L}ightweight Standalone Knowledge Graph Browsing Powered by {R}ust},
  year         = 2023,
  publisher    = {Zenodo},
  version      = {x.y.z},
  doi          = {10.5281/zenodo.8290117},
  url          = {https://doi.org/10.5281/zenodo.8290117}
}

Citation string

Konrad Höffner (2023). RickView: Lightweight Standalone Knowledge Graph Browsing Powered by Rust. https://doi.org/10.5281/zenodo.8290117

Contribute

We are happy to receive pull requests. Please use cargo +nightly fmt before committing and make sure that the code compiles on the newest stable and nightly toolchain with the default features. Browse the default knowledge base after cargo run and verify that nothing is broken. cargo clippy should not report any warnings. You can also contribute by recommending RickView and by sharing your RickView deployments.