dwp / ms-html-to-pdfa

micro-service that takes an XHTML document and produces a pdf document at various conformance levels. The main driver for this is to create a 'generic' pdf-generator but also to create PDFA/1A compliant documents for DRS. It has recently been extended to support the PDF/UA standard for accessibility and uses an(other) in-house service which has abstracted the pdf 'build' activities (https://github.com/dwp/html-to-pdf)
MIT License
5 stars 3 forks source link

ms-html-to-pdfa

Build Status Known Vulnerabilities

RESTful service receiving json to construct a PDF document to various conformance levels

build & run

Standard maven build.

NOTE: this application accepts environment variables that will be picked up at runtime (this file is bundled into to container). If https configuration is needed a modified config.yml must be mounted into the container with the appropriate keystore/truststore locations (see dropwizard documentation).

server:
  applicationContextPath: ${SERVER_CONTEXT_PATH:-/}
  applicationConnectors:
  - type: ${SERVER_APP_CONNECTOR:-http}
    port: ${SERVER_APP_PORT:-6677}
  adminConnectors:
  - type: ${SERVER_ADMIN_CONNECTOR:-http}
    port: ${SERVER_ADMIN_PORT:-0}
  requestLog:
    type: ${SERVER_REQUEST_LOG_TYPE:-external}

performance testing

A k6 script is included to satisfy a basic load test. By default, this will target the application running on localhost, via the docker hostname host.docker.internal. This can be altered by passing an optional TARGET_HOST environment variable.

Ensure you have the service running, and execute the test as follows:

# Default target: host.docker.internal
docker run --rm -i --name loadtest \
  -v $PWD:/k6 \
  loadimpact/k6 run - < ./load-test/test.js

# Custom target (must be accessible from within the k6 container)
docker run --rm -i --name loadtest \
  -e TARGET_HOST=some-target:8080 \
  -v $PWD:/k6 \
  loadimpact/k6 run - < ./load-test/test.js

# Change no. virtual users and duration
docker run --rm -i --name loadtest \
  -v $PWD:/k6 \
  loadimpact/k6 run --vus 20 --duration 5m - < ./load-test/test.js

Default configuration and criteria for satisfying performance thresholds are bundled in the test scripts themselves.

For configuring the tests in the CI pipeline, refer to the official GitLab documentation or underlying template source.

/generatePdf

POST endpoint receiving the information to build the pdf file

{
    "colour_profile": "base64-encoded-file",
    "font_map": {
        "tahoma": "base64-encoded-file",
        "arial": "base64-encoded-file"
    },
    "page_html": "base64-encoded-html",
    "conformance_level": "PDFA_1_A"
}

Pdf conformance levels are detailed here with acceptable values for this service as:-

The only mandatory parameter is the base64 encoded html. If only the html is passed a standard colour profile will be used, arial (standard) and courier (monospace) will be embedded to the pdf and the conformance level for the pdf will be PDF/UA

Returns:-

Usage notes

For the incoming html there are 2 things to consider.

eg.

<html>
    <head>
        <style>
            pre, code, var {
                font-family: 'courier', serif;
            }
            body {
                font-family: 'arial', serif;
            }
        </style>
    </head>
    <body>
        <h1>hello world</h1>
        <img
            width="250px" height="250px"
            src="https://github.com/dwp/ms-html-to-pdfa/raw/master/"
            alt="base64 encoded embedded image"
        />
    </body>
</html>

Common faults

/version-info

Endpoint to return a standard JSON document with build information.

example output is:-

{
  "app": {
    "name": "ms-html-to-pdfa",
    "version": "1.6.0",
    "build": "133",
    "build_time": "2019-09-09T09:58:17Z"
  }
}

Examples

The following will base64 encode the html file contents, call the service, decode the response and write to file on *nix based operating systems

curl -m 10 -X POST --data '{"page_html":"'$(cat src/test/resources/successfulHtml.html | base64)'"}' http://localhost:6677/generatePdf | base64 -D > test.pdf

This example will return the current build information

curl http://localhost:6677/version-info

Continuous Integration (CI) Pipeline

For general information about the CI pipeline on this repository please see documentation at: https://confluence.service.dwpcloud.uk/x/_65dCg

Pipeline Invocation

This CI Pipeline now replaces the Jenkins Build CI Process for the ms-html-to-pdfa.

Gitlab CI will automatically invoke a pipeline run when pushing to a feature branch (this can be prevented using [skip ci] in your commit message if not required).

When a feature branch is merged into develop it will automatically start a develop pipeline and build the required artifacts.

For production releases please see the release process documented at: https://confluence.service.dwpcloud.uk/pages/viewpage.action?spaceKey=DHWA&title=SRE A production release requires a manual pipeline (to be invoked by an SRE) this is only a release function. Production credentials are required.

localdev Usage

There is no change to the usage of localdev. The gitlab CI Build process create artifacts using the same naming convention as the old (no longer utilised) Jenkins CI Build process.

Therefore please continue to use branch-develop or branch-f-* (depending on branch name) for proving any feature changes.

Access

While this repository is open internally for read, no one has write access to this repository by default. To obtain access to this repository please contact #ask-health-platform within slack and a member will grant the appropriate level of access.