ebmdatalab / openprescribing

A Django app providing a REST API and dashboards for the HSCIC's GP prescribing data
https://openprescribing.net
MIT License
97 stars 26 forks source link

Do a tech stack review #74

Closed annapowellsmith closed 1 year ago

annapowellsmith commented 8 years ago

Has anyone built a generic, web-based visualisation tool for large datasets, with affordable licensing for public use?

If so then we can throw away our bespoke code!

Last time I reviewed this I concluded that no-one had. Tableau is the closest, but they are licensed per-user, and most of their upcoming competitors seem to have adopted the same per-user model.

A couple seemed open to a discussion about making GP dashboards free, and charging for CCGs.

It would be worth spending some time looking into this.

annapowellsmith commented 8 years ago

We talked about this today, as a consequence of Seb's work on #101 and #73.

The options for replacing the back-end (assuming no out of the box visualisation platform exists) seem to be as follows:

  1. Improve our Postgres performance by an order of magnitude
  2. Move the back-end to a big data product that hosts data and offers an API out of the box (Anna feels like this must exist) - or move the whole thing to a visualisation product as above (probably doesn't exist)
  3. Move the back-end to BigTable with Google App Engine (or Amazon equivalent, though we are biased towards Google because we already use BQ and Django)
  4. Move the back-end to Cassandra / CouchDB
  5. Move the back-end to BQ

(1) is out because it will require dedicated hardware (we are running out of space on our current setup) and dedicated sysadmin skills. (4) is out because it would need too much resource.

We think the next steps are, in order:

annapowellsmith commented 8 years ago

Some research this morning.

The Google data ecoystem has a variety of products - as well as BigTable there's Cloud Datastore and Cloud SQL. They strongly recommend BigQuery for analytics, but it looks as though other products may be more appropriate than BigQuery for our needs.

Our requirements are:

Although it's not a strict requirement, any solution that supports on-the-fly percentile calculations would be extremely useful.

Quick notes:

I will investigate Cloud SQL and BigTable further. Cloud SQL is particularly interesting, as the friction/cost of moving might be quite low.

annapowellsmith commented 8 years ago

I've written up options here: https://docs.google.com/document/d/1cbO0KiagXPOLR32NAylqs7tkGMA2A5mPERrTxNMMZT4/edit

Additions, corrections, commiserations welcome.

I think the next steps are:

annapowellsmith commented 8 years ago

I am also looking at using BigQuery as a back-end - will spend 1 more day on this.

sebbacon commented 8 years ago

I've started a test rig for various queries here and written up a brief for a postgres expert here.

annapowellsmith commented 8 years ago

Russ says he has some free time (yay) to look at Postgres once he's finished post-EMF work. Will nudge him middle of next week if necessary.