heidimok commented 10 months ago

Context

For the epic https://github.com/NASA-IMPACT/admg-casei/issues/576, we are directly accessing and downloading the data for 5 campaigns via url and processing it in order to reduce the file size.

Technical Discovery

A way to access the data via API instead

praveenphatate commented 10 months ago

https://colab.research.google.com/drive/1h-axDa69rKxbB-o-OZPnmrfVU5aXauMx?authuser=1#scrollTo=Bbay8-Ov4le1

Jeaton1021 commented 10 months ago

Have this discussion during deepdive on Thursday @heidimok @smwingo

praveenphatate commented 10 months ago

We can fetch the deployments which contain date range by using campaign short_name

activate = Campaign.objects.get(short_name = "ACTIVATE")
deployments = Deployment.objects.all().filter(campaign = activate.uuid)

Here is an example output

for deployment in deployments:
  print(deployment.start_date, deployment.end_date, deployment.short_name)

 2021-05-13 2021-06-30 ACTIVATE_dep_2021b
 2021-11-30 2022-03-29 ACTIVATE_dep_2021c
 2020-08-13 2020-09-30 ACTIVATE_dep_2020b
 2022-05-03 2022-06-18 ACTIVATE_dep_2022
 2020-02-14 2020-03-12 ACTIVATE_dep_2020a
 2021-01-27 2021-04-02 ACTIVATE_dep_2021a

And, also FYI meta fields of deployment has flight_tracks which is empty currently

 <django.db.models.fields.UUIDField: uuid>,
 <django.db.models.fields.CharField: short_name>,
 <django.db.models.fields.CharField: long_name>,
 <django.db.models.fields.TextField: notes_internal>,
 <django.db.models.fields.TextField: notes_public>,
 <django.db.models.fields.related.ForeignKey: campaign>,
 <django.db.models.fields.DateField: start_date>,
 <django.db.models.fields.DateField: end_date>,
 <django.contrib.gis.db.models.fields.PolygonField: spatial_bounds>,
 <django.db.models.fields.TextField: study_region_map>,
 <django.db.models.fields.TextField: ground_sites_map>,
 <django.db.models.fields.TextField: flight_tracks>

heidimok commented 8 months ago

To close out the last PI, I'm going to be closing the epic. But since this issue feeds into the next PI, I'll link it to a new epic that relates to visualizing the rest of the available flight tracks in CASEI beyond just the 5 we prototyped here.

heidimok commented 8 months ago

Update Jan 18 - @praveenphatate to add documentation and close out before new PI

praveenphatate commented 8 months ago

Effieciently download Deployment data

Problem Statement

CASEI websites needs visualization of flight tracks for all deployments i.e., Meteorological and Navigational Data
Inorder to download the data we need concept_id, deployment start date, deployment end date and specific keywords like SUMMARY or METNAV to access the data
How can we create and store all the data required for this task efficiently,

Solution

Since, CASEI backend already has the database where the required information can be found we can try and create a simple csv file to hold all that data Steps:

Get the intrested CAMPAIGN
Get all the DEPLOYMENTS associated with it
- Get the COLLECTION PERIODS for each deployment
- Get the DOI asscociated with COLLECTION PERIOD
- GET ALL THE DRAFTS associated with DOI and filter the DOI based on published and Meteorological and Navigational Data Here is a simple snippet on how to find the required information.
```
camp = Campaign.objects.get(short_name = 'CAMP2Ex')
```

deployments = Deployment.objects.all().filter(campaign = camp.uuid)

col_prd = deployments[0].collection_periods.all()

dois = DOI.objects.filter(collection_periods=col_prd[0])

concept_ids = [doi.concept_id for doi in dois]

Filter Change objects based on concept IDs

drafts = Change.objects.filter( content_typemodel='doi', actionin=[Change.Actions.CREATE, Change.Actions.UPDATE], update__concept_id__in=concept_ids )

for drf in drafts: if drf.status == 6 and 'Meteorological and Navigational Data' in drf.update.get('cmr_entry_title', ''): print(drf.status, drf.action, drf.updated_at, drf.update['concept_id'], drf.update['cmr_entry_title']) OUT: 6 Create 2021-06-21 23:29:24.087000+00:00 C1954736081-LARC_ASDC CAMP2Ex P-3 In-Situ Meteorological and Navigational Data


This can be expanded to include the Campaigns and Deployments by using the concept_id or collection_id and deployment start_date and end_date.

## Option 1
Using the CMR json query to fetch all the location url for each deployment and then download all the .ict files or create a .yaml file containing all the .ict locations.

import requests import xml.etree.ElementTree as ET import json

This is the url link for ACTIVATE B-200 (King AIR) for time period 2020-02-14 00:00:00 to 2020-03-12 23:59:59

url = "https://cmr.earthdata.nasa.gov/search/granules?collection_concept_id=C1994460996-LARC_ASDC&platform=King%20Air&temporal[]=2020-02-14T00:00:00Z,2020-03-12T23:59:59Z&page_size=200" response = requests.get(url)

Check if the request was successful (status code 200)

if response.status_code == 200:

Parse the XML content

root = ET.fromstring(response.content)

# Extract location from XML data
locations = [reference.find("location").text for reference in root.findall(".//reference")]

Iterate through each location URL

for location_url in locations:

Make a request to the location URL

location_response = requests.get(location_url)

# Check if the location request was successful (status code 200)
if location_response.status_code == 200:
    # Parse the JSON content
    location_data = json.loads(location_response.text)

    # Extract relevant ict from the JSON dict response
    download_url = location_data.get("RelatedUrls", [{}])[0].get("URL", "")

    print(f"Download URL: {download_url}")

Sample Output

Which can then filtered based on the keyword like METNAV or SUMMARY

## Option 2
Using the POST request to fetch and download all the .ict files or create a .yaml file containing all the .ict locations.

This particular POST request takes the "readable_granule_name" which takes the keyword like "ACTIVATE-SUMMARY" or "ACTIVATE-METNAV"

body = {'params': {'concept_id': [], 'echo_collection_id': 'C1994460739-LARC_ASDC', 'exclude': {}, 'options': {'readable_granule_name': {'pattern': 'true'}}, 'page_num': 1, 'page_size': 20, 'readable_granule_name': ['ACTIVATE-SUMMARY*'], 'sort_key': 'start_date', 'temporal': '2021-11-30T00:00:00.000Z,2022-03-29T23:59:59.999Z', 'two_d_coordinate_system': {}}} res = requests.post( "https://d53njncz5taqi.cloudfront.net/granules", headers={"Authorization": "Bearer YOUR EARTH DATA TOKEN"}, data=json.dumps(body))



## Challenges/Issues
- Since fetching and downloading the .ict files requires keywords like METNAV or SUMMARY, these keywords are not available in the CASEI Database. How can we efficiently get these keywords so that filtering the .ict files can be done efficiently

## Preferred Solution 
- Is to use Option 2 as its much more efficient when compared to Option 1, we don't need to do json request for each Deployment to the list of files and then filter through them.

NASA-IMPACT / admg-casei

How to efficiently download deployment data via API #581