[Flight tracks feature] Pipeline for processing data in the cloud

NASA-IMPACT / admg-casei

ADMG Inventory

https://impact.earthdata.nasa.gov/casei/

Apache License 2.0

1 stars 0 forks source link

[Flight tracks feature] Pipeline for processing data in the cloud #596

Closed heidimok closed 3 months ago

heidimok commented 6 months ago

Context

For the flight paths feature we are working through a workflow for adding new flight visualizations to CASEI. But there are many manual aspects of this work and we need to automate it as much as possible. In particular we want to make the developer experience as clear and seamless as possible.

Goal

Build a pipeline to process the data for a new campaign every time we have a change in the repository or at least an easy way to manually execute the data processing in the cloud.

smwingo commented 5 months ago

perhaps share a demo on this at Sprint #2 review meeting? revisit at 12 Feb check in

smwingo commented 4 months ago

revisit at Mon 25 Mar -- demo on 29th?

heidimok commented 4 months ago

March 25 Sprint 5 Update

Adding a task that's not represented as a standalone issue but related to campaigns MACPEX, MC3E, VIRGAS, SEAC4RS, where we have the data collected by @praveenphatate and @als0076 but not yet visualized because of inconsistencies in data formats and coordinates across the data.

In order to extend the processing capabilities to support these, @willemarcel is continuing to work on parsing of the different data formats and coordinates to handle these scenarios.

heidimok commented 3 months ago

Hi @willemarcel I know this issue was meant to be a place to represent the pipeline work you're doing, which has changed throughout this PI as the data for each campaign has changed.

To prevent carrying through issues across PIs, to what extent do we have an established pipeline now and can create future issues as needed to update it? My impression is that this is done but may need adaptation as new data comes.

heidimok commented 3 months ago

After talking with @willemarcel we will close this issue but start a new, more updated version, for the next PI.

This is the current conclusion:

Technically speaking, we were not able to create a pipeline for processing nav data in the cloud because the data formats have proven to be too inconsistent to be able to reliably go straight from a yaml file to visualization on the web without manual intervention.
Throughout this PI, Wille has still had to make adjustments to the visuals locally and resave to a GeoJSON on the S3 bucket
Likely we need to continue adding more campaigns and learning about more edge cases before we attempt to create a automated visualization pipeline.