ebmdatalab / openpath-dash

Experimental Dash version of openpathology browser
0 stars 1 forks source link

Deployment

Via dokku. Add a remote called dokku that points to dokku@dokku.ebmdatalab.net:openpath-flask.

Then you can do git push dokku master to deploy, assuming your ssh public key is installed for the dokku user on that server.

Deployment requires a username and password to be set in the environment. You can do this on the server thus:

dokku config:set openpath-flask BASIC_AUTH_CREDENTIALS="username: password"

Persistent storage

The raw data is currently not checked into the repo, pending a decision on doing so.

Until that decision, data can be made available to the app via dokku's Persistent Storage

dokku storage:mount openpath-flask /var/lib/dokku/data/storage/openpath-dash/data_csvs:/var/data_csvs
dokku config:set openpath-flask DATA_CSVS_PATH=/var/data_csvs

To update the data, you'll want to update it in /var/lib/dokku/data/storage/openpath-dash/data_csvs. This should contain a copy of everything in data_csvs/ from the repo, plus any newer all_processed.csv.zip file.

You must redeploy (restart) an app to mount or unmount to an existing app's container.

Navigating the code

To run the app, run python index.py. This sets up a flask app (app.py), and imports modules in apps/, each providing a particular chart for the single page app.

Per-client state is stored as stringified JSON in a hidden div. Bookmarkable state is mirrored in the location bar of the browser. This is all handled by apps/stateful_routing.py.

When a chart element (defined in layouts.py changes, its state flows to the stateful_routing module and the location bar; the various charts are wired to changes in the per-client state and update accordingly. Charts that are not currently being viewed are hidden (see apps/base.py), as Dash requires everything wired up for callbacks to be present on the page.

Is Dash a good choice?

Probably, enough to give it a proper change.

Benefits:

Costs:

Performance

Displaying 86 charts on a fast laptop takes around 15s. This time is halved if you remove all interactivity from the charts.

There may be further performance improvements from selectively removing interactivity.

There also seems to be an opportunity to cut down the time inside plotly-py; 25% of the time is spent in a string validator (look at this as HTML in a browser)

Currently the same thing takes about 45s on OpenPrescribing, though probably 40s of this is network time; the main concern with Dash here is that it chews a lot of CPU. We would probably want to implement a smooth-scroll handler for this, per these notes

Pipeline

Run with flask get_practice_codes gets practice codes (crucially, including CCG membership)

Then process new data files with flask process_file <lab_code> <filename> - this

Finally run flask postprocess_files <filenames> to anonymise (replace practice ids) and report outlier data