debrief / pepys-import

Support library for Pepys maritime data analysis environment
https://pepys-import.readthedocs.io/
Apache License 2.0
5 stars 5 forks source link

Proof of Concept for Pepys in Jupyter #1078

Open IanMayo opened 2 years ago

IanMayo commented 2 years ago

šŸž Overview

Produce a proof-of-concept for viewing Pepys data in a Jupyter notebook.

This will de-risk the future use of Jupyter notebooks both in Pepys and in general usage by analysts, offering lessons learned in data connectivity, data processing, and visualisation.

Time-permitting, to include:

šŸ”— Feature

This represents an alternate solution for #859

šŸ”¢ Acceptance criteria

Machine Learning

SciKit provides capable clustering algorithms. But, we need to think of an application of this method to Pepys data

Offline mapping

Pepys will frequently be used without an Internet connection, unable to provide an OpenStreetMap backdrop. It would be useful to consider how a similar capability could be used to provide coverage in these areas of descending importance:

I guess some options are:

Sample analysis task #

Extended analysis task, considering bulk data #

Prioritised subsequent tasks #

robintw commented 2 years ago

@IanMayo Here are some initial demos of a very simple notebook interface: Jupyter_1

Jupyter_2

There are loads of problems with this interface, but it's just an idea of what is possible with just a few lines of code. I'll put up a PR shortly so you can see the actual notebook code, and then I'll move on to some of the other stuff we wanted to demo.

robintw commented 2 years ago

See #1096 for a PR including this notebook code. I've also included some static and interactive plots of other variables - see:

image

and

image

Notably at the moment we have to work around pandas incompatibility with SQLAlchemy 2.0. This means that the SQLAlchemy 'engine' that we create in the Pepys DataStore won't work with pandas, as we create it using future=True (to use the new features and deprecations of SQLAlchemy 1.4, to make it ready for 2.0). There is a pandas issue for adding support for SQLAlchemy 2.0 (see https://github.com/pandas-dev/pandas/issues/40460), which seems to be stalled for lack of volunteers with the relevant experience - that might be something we could contribute to if you were interested.

robintw commented 2 years ago

Ah yes, one more thing:

Do you have any really good, realistic (ideally actually real - but not sensitive) data that I could use for playing around with developing analysis capabilities in Jupyter? Part of the reason I built the UI for selecting a platform and plotting the points was so that I could see if I could find a realistic looking track - a lot of the data on TracStor is obviously test data. The best I found was this HIPP platform, but it hasn't got a massive amount of data (only ~350 data points). If I were to start running scikit-learn models on data I'd ideally like something fairly realistic and reasonably large. Any ideas?

IanMayo commented 2 years ago

Aah, @robintw - from the depths of my memory I remembered where I'd seen a sample dataset, it's in the CSV files here: https://www.gov.uk/government/news/dstl-shares-new-open-source-framework-initiative

Some tracks appeared to have up to 3k points.

Obvs you'll either have to produce a parser to get the data into Pepys, or do some Excel column fiddling to make it look like an existing format which we parse. The "unknown platform" handling will be great for this data :-D

IanMayo commented 2 years ago

Here's another source of AIS data @Robin - it's a huge dataset, hopefully they're long tracks rather than just lots of small ones. https://marinecadastre.gov/ais/

IanMayo commented 2 years ago

@robintw - the analyst have come up with a useful analysis task ( above ) to "drive" the technical demonstrator. I'm happy to either expand the terms or rephrase the description as necessary for you to understand/implement it.

robintw commented 2 years ago

Thanks @IanMayo. That's an interesting task, and slightly different to what I was expecting. I'll have a ponder and do some experimentation and get back to you.