AT&T Vault Tech Scenario

This repository contains the draft code used to explore and analyze the data in the 12/2020 "Technical Scenario" document for VAULT. It is organized into a set of Jupyter notebooks runnable on any Linux or Mac system. For notebooks without interactive plots, the notebook is provided with output embedded directly into it, so that the results can be seen without having to set up and execute the code. Notebooks without output included are meant to be viewed "live", with a running Python server, so that the data can be fully explored interactively. PDF copies of all notebooks are provided for quick skimming or in case the notebook code or data is not available for running. Where appropriate, you can also visit a deployed version of the code.

To understand our algorithm and approach, please see our write-up at High Performance Hit Finder.

To get started with this codebase, see the Quickstart.

You can access deployed versions of the notebooks and dashboard at http://bit.ly/attvault, though these will be taken down at some point after the demo presentation.

Data

See Downloading Data

Notebooks

The notebooks fall into the following categories:

EDA

These notebooks start with raw data where possible, with a goal of revealing it as it is, with as little cleanup as possible, so that same process can be applied to new data. These are primarily self contained, not relying on external scripts or modules in this repository (just packages in the Python environment installed).

Viewing_AIS: Basic rendering of location data from sets of AIS pings. (PDF)
Viewing_AIS_Categorical: Breakdown of AIS location data by vessel type. (PDF)
Viewing_TLEs: Basic rendering of earth-centered satellite location at epoch time from sets of TLEs. (PDF)

Data exploration

These notebooks also focus on data, but on derived or computed values.

Viewing_AIS_Gaps: Visualizing unusually large gaps between AIS pings. (PDF)
Viewing_Tracks: Visualizing computed satellite tracks (derived from TLE records). (PDF)

Prototypes

These files start with processed/prepared data, and approximate an end-user task (e.g. hit detection).

Hit_Finder: Notebook for calculating vessels viewable by a satellite over a date/time range. (PDF)
Hit_Dashboard: End-user app for showing tracks and vessels viewable by a satellite over a date/time range. (PDF)

Machine learning / Analysis use cases

DOD_anomaly: Case study provided by H2O for Pinnacle Use Case: Classify Suspicious Activity from AIS Data. (PDF)
PrepareDataForMachineLearning: Curate and Prepare Data for various Pinnacle Use cases. (PDF)
AIS_Analyze_Vessel_Cluster: EDA, Stats, K-Means clustering, Plots for a given vessel. (PDF)
AIS_Anomaly_Detection: Collect stats and flag anomalous vessel coordinates. (PDF)

Data preparation

These files start with raw data and create cleaned/consolidated/computed data for use in the other categories. Many of these rely on scripts in scripts/, where you can see the detailed computations involved.

AIS_Parser: Parse the 2015-2017 flat csv files and transform data into Vessel, Broadcast, and Voyage files to be uniform with the GDB Exported Data. (PDF)
AIS_Validation: Combine all vessels' data and generate clean consolidated files. (PDF)
TLE_Parser: Validate or correct the TLE data, producing gridded data for ingestion into the compute engine. (PDF)
TLE_precompute_checks: Various sanity checks on the TLE data. (PDF)
TLE_to_pytables: Converting TLE data into h5 format. (PDF)

Python Scripts

These are all in the scripts/ subdirectory. Most print useful help with given the --help option, or in their file docstrings.

hit_finder.py: CLI tool for computing all vessels & times visible to a satellite
reverse-hit-finder.py: CLI tool for computing all satellites visible to a particular vessel MMSI
intersect.py: The core logic of the visibility intersection algorithm
interpolate_ais.py: Generates HDF5 files with synthetic interpolated points for vessel motion
build_index_parallel.sh: Parallel driver for sathelpers.py, to precompute satellite trajectories.

Deliverables Checklist:

This page serves as the main instruction index. From here, you can navigate to various resources, deliverables, and documention specific to that process.

Public GitHub – All code/doc/Instructions
- Main Repos: https://github.com/att-vault/vault
- API Repos: https://github.com/att-vault/vault-apis
Public vault-data-corpus on S3: http://vault-data-corpus.s3-website.us-east-2.amazonaws.com/ (a subset of which is provided at vault-data-minimal, sufficient for running the code)
- Satellite Data - Contains all TLE related data snapshots from various EDA/Curation processes
- Vessel Data - Contains all AIS related data snapshots from various EDA/Curation processes
- Docker Images - Contains latest Docker images for API and Interactive UI App; but you can also use our Jenkins pipeline to build and deploy new Docker images as well.
Deployed apps at http://bit.ly/attvault , though these will be taken down at some point after the demo presentation.

Background reading

DoD/government documents:

Data files:

Tech Scenario data files on S3

General background:

att-vault / vault

readme