att-vault / vault

ATT Vault Tech Scenario
2 stars 1 forks source link

AT&T Vault Tech Scenario

This repository contains the draft code used to explore and analyze the data in the 12/2020 "Technical Scenario" document for VAULT. It is organized into a set of Jupyter notebooks runnable on any Linux or Mac system. For notebooks without interactive plots, the notebook is provided with output embedded directly into it, so that the results can be seen without having to set up and execute the code. Notebooks without output included are meant to be viewed "live", with a running Python server, so that the data can be fully explored interactively. PDF copies of all notebooks are provided for quick skimming or in case the notebook code or data is not available for running. Where appropriate, you can also visit a deployed version of the code.

To understand our algorithm and approach, please see our write-up at High Performance Hit Finder.

To get started with this codebase, see the Quickstart.

You can access deployed versions of the notebooks and dashboard at http://bit.ly/attvault, though these will be taken down at some point after the demo presentation.

Data

See Downloading Data

Notebooks

The notebooks fall into the following categories:

EDA

These notebooks start with raw data where possible, with a goal of revealing it as it is, with as little cleanup as possible, so that same process can be applied to new data. These are primarily self contained, not relying on external scripts or modules in this repository (just packages in the Python environment installed).

Data exploration

These notebooks also focus on data, but on derived or computed values.

Prototypes

These files start with processed/prepared data, and approximate an end-user task (e.g. hit detection).

Machine learning / Analysis use cases

Data preparation

These files start with raw data and create cleaned/consolidated/computed data for use in the other categories. Many of these rely on scripts in scripts/, where you can see the detailed computations involved.

Python Scripts

These are all in the scripts/ subdirectory. Most print useful help with given the --help option, or in their file docstrings.

Deliverables Checklist:

This page serves as the main instruction index. From here, you can navigate to various resources, deliverables, and documention specific to that process.

  1. Public GitHub – All code/doc/Instructions
  2. Public vault-data-corpus on S3: http://vault-data-corpus.s3-website.us-east-2.amazonaws.com/ (a subset of which is provided at vault-data-minimal, sufficient for running the code)
    • Satellite Data - Contains all TLE related data snapshots from various EDA/Curation processes
    • Vessel Data - Contains all AIS related data snapshots from various EDA/Curation processes
    • Docker Images - Contains latest Docker images for API and Interactive UI App; but you can also use our Jenkins pipeline to build and deploy new Docker images as well.
  3. Deployed apps at http://bit.ly/attvault , though these will be taken down at some point after the demo presentation.

Background reading

DoD/government documents:

Data files:

General background: