bwt / OpenPaperView

an OpenPaper.work mobile companion.
GNU General Public License v3.0
13 stars 2 forks source link
android-application f-droid fdroid

OpenPaperView

An OpenPaper.work mobile companion.

<img src="https://fdroid.gitlab.io/artwork/badge/get-it-on.png" alt="Get it on F-Droid" height="80"> <img src="https://play.google.com/intl/en_us/badges/images/generic/en-play-badge.png" alt="Get it on Google Play" height="80">

Or download the latest APK from the Releases Section.


Disclaimer 1 : This Android application only works with OpenPaper.work. It also requires a lot of setup. If you don't want to spend hours preparing a server (only to be disappointed because the application is not what you expected) you can use the demo mode of the application (and be disappointed right away)

Disclaimer 2 : This is a very niche project. You may well be the first to try to understand the following instructions. Please report errors, omissions, inaccuracies.

The whole system consists of 4 parts :

The basic idea is to build an SQLite database from the data collected by Paperwork and serve that database (and the actual scans) to the viewer over HTTPS.

Features

Limitations

Installation

An OpenPaper.work installation

This is probably the easiest part. You need to locate :

The Python script

The tools/create_viewer_cb.py script must be executed periodically. It scans the papers directory, adds the OCRed text from the Paperwork database and create an SQLite database. It would be nice to be able to integrate it into OpenPaper.work. If you have the required skills, please help with this feature request

The only dependency I remember is PyPDF2 1.x (Fedora package python3-PyPDF2)

Parameters, like the input and output paths are defined in create_viewer_cb.config.

By default the full text of the documents is indexed and stored. The index is used for full text search, the text itself is used to show search result snippets.

In my case, each document increases the size of the database by about 10 kb :

To keep the DB small, it is possible to omit documents, partially or completely. See the labels section of the config file for more details.

An HTTP server

The server sends the document data and the SQLite file to the viewer.

It should support :

It needs access to the papers content and to the database built by the script. You may need to adjust the access right of the files generated by PaperWork.

Certificate creation

Server authentication is quite standard, and is not covered here.

Client authentication is less common, I used OpenSSL to create the necessary files.

I am not, by far, an OpenSSL expert. Please report mistakes, inaccuracies or bad practices.

The basic idea is to create an authority and use it to sign certificates. The authority's certificate will then be installed on the server, while a signed certificate (with corresponding private key) will be imported into the viewer.

  1. Create the CA's private key. This should be kept in a secure place.

    openssl genrsa -out ca_private.key 4096
  2. Create the CA's (self signed) certificate. This is the file to be installed on the server.

    openssl req -new -x509 -days 3660 -key ca_private.key -out ca.crt

Then for each client :

  1. Create the private key :

    openssl genrsa -out client_private.key 4096
  2. Create a certificate request. You will be asked for a Common Name, it can be anything as long as it is not empty :

    openssl req -new -key client_private.key -out client_request.csr
  3. Sign the client's request with the CA's key, creating a certificate with a 10 years validity, the serial should be different for each certificate :

    openssl x509 -req -days 3650 -in client_request.csr -CA ca.crt -CAkey ca_private.key -set_serial 1 -out client.crt
  4. Create the PEM file to be imported in the viewer app :

    cat client.crt client_private.key >client_full.pem
Configuration

A sample configuration for NGINX :

server {
    # compress the sqlite DB file
    gzip on;
    gzip_types application/octet-stream;

    # SSL configuration
    listen 443 ssl http2 default_server;
    listen [::]:443 ssl http2 default_server;

    # Server authentication :
    # The server's certificate and private key
    ssl_certificate certs/server.crt;
    ssl_certificate_key private/server.key;

    # Client authentication :
    # The CA signing the client's certificate
    ssl_client_certificate certs/ca.crt;

    # make verification optional, so we can display a 403 message to those
    # who fail authentication
    ssl_verify_client optional;

    root /var/www/;

    index index.html index.htm;

    server_name _;

    location / {
        deny all;
    }

    # this is the viewer's base URL
    # where it expects to find :
    # papers.sqlite
    # papers/
    location /papers_base_dir/ {
        # if the client-side certificate failed to authenticate, show a 403
        # message to the client
        if ($ssl_client_verify != SUCCESS) {
            return 403;
        }

        try_files $uri =404;
    }

OpenPaperView settings

Base URL

The viewer downloads the sqlite DB, the document images and pdf. For example if the base URL is https:example.com/paperwork/base the viewer will query :

The database :

The documents thumbnail, images and pdf :

Auto download labels

Every time the database is updated, the documents having one of the labels will be downloaded. If manually deleted, they will be re-downloaded with the next update.

Authentication

Authentication is done through HTTPS with mutual authentication.

To authenticate itself on the server, the viewer needs a certificate and the corresponding private key. It expects a PEM file containing exactly one certificate and one private key. This typically looks like :

some optional description
-----BEGIN CERTIFICATE-----
Base64 encoded content
-----END CERTIFICATE-----

-----BEGIN PRIVATE KEY-----
more Base64 content
-----END PRIVATE KEY-----

You can optionaly add a certificate for a custom certification authority. This is used to authenticate the server and is only necessary if the server's certificate is not signed by a well known CA.
If provided, it will be the only CA trusted by the viewer. If not, Android's system (built-in) CAs will be trusted.
In any case Android's user CAs (i.e. manually imported on the device) are not trusted.

Extension

I found that with small screens it is not very practical to identify documents based on the thumbnail. Having a title is much more comfortable.

If the first line of Paperwork's extra keywords starts with a # the line is used as a title for the document.

License

Copyright (C) 2024 Philippe Banwarth

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.