creare-com / podpac

Pipeline for Observational Data Processing Analysis and Collaboration
https://podpac.org
Apache License 2.0
45 stars 6 forks source link

Test podpac-example notebooks for 2.0 release #398

Closed mpu-creare closed 4 years ago

mpu-creare commented 4 years ago

Description

NOTE: Please start testing on the develop branch until the release/2.0.0 branch is created..

We need to ensure that the example notebooks are up to date and run without error on the 2.0 release.

Please check a notebook (or two or three... etc.) and check it off when you confirmed it runs through without errors. Also make any modifications needed.

WARNING:

Check every link. Many of the links are BROKEN.

Progress

mpu-creare commented 4 years ago

I only scratched around the scratch folder -- gleaning a few things there that were useful.

dsully-dev commented 4 years ago

When testing 3-processing/running-on-aws-lambda.ipynb, I ran into some issues. I tried evaluation the node locally, and ran into more issues. In general, I've had a hard time getting PODPAC installed with all dependencies. Here is my attempt at evaluating that node:

%matplotlib inline
# Get the PODPAC logger
import logging
logger = logging.getLogger("podpac")
Setup PODPAC for AWS (1/2)
Configure AWS credentials
Can be specified at runtime, or in the podpac settings file
In [8]:
import podpac
from podpac import settings

# need to allow unsafe evaluation to use `podpac.algorithm.Arithmetic` Node (below) 
settings.set_unsafe_eval(True)

# # Credentials
# settings["AWS_ACCESS_KEY_ID"] = "access key id"
# settings["AWS_SECRET_ACCESS_KEY"] = "secrect access key"
# settings["AWS_REGION_NAME"] = "region name"

# # General Settings
# settings["AWS_TAGS"] = {} # tags to assign to AWS resources created through PODPAC

# # S3
# settings["S3_BUCKET_NAME"] = "S3 bucket for Lambda functions or PODPAC cache"

# # Lambda
# settings["FUNCTION_NAME"] = "name of lambda function to eval"
# settings["FUNCTION_ROLE_NAME"] = "role name for lambda function"
# settings["FUNCTION_DEPENDENCIES_KEY"] = "path on S3 bucket where function depedencies live"
# settings["FUNCTION_S3_INPUT"] = "path on S3 bucket for input pipelines. Objects put in this directory will trigger lambda function",
# settings["FUNCTION_S3_OUTPUT"] = "path on S3 bucket for pipeline outputs. Objects put in this directory will be returned to lambda function",

# # Paths - overwrite paths for Lambda caching
# this will be fixed in future releases of PODPAC
settings["ROOT_PATH"] = "/tmp/"
settings["LOG_FILE_PATH"] = "/tmp/"
Provide Earth Data Login Credentials
If you do not have an earth data login, or have not activated OpenDAP access, follow the instructions here.
In [3]:
import getpass
username = password = None
username = input("Username:");   password = getpass.getpass('Password:')

# EarthData Credentials need to get saved in `settings` in order to get passed
# into the Lambda function on AWS. This will be automated in the future.
settings["username@urs.earthdata.nasa.gov"] = username
settings["password@urs.earthdata.nasa.gov"] = password
Setup (2/2)
Create the PODPAC Pipeline
We'll use the same pipeline from the 100-analyzing-SMAP-data.ipynb notebook
This example computes the difference between the current soil moisture for a region, and that of the previous year
In [17]:
import podpac.datalib

# Create the Pipsmap_offset
product = 'SPL4SMAU'
smap = podpac.datalib.smap_egi.SMAP(product=product, username=username, password=password)
smap_time1_offset = podpac.algorithm.ExpandCoordinates(source=smap, time=['-1,Y', '-1,Y', '1,Y'])
smap_offset = podpac.algorithm.Mean(source=smap_time1_offset, dims=['time'])

s = podpac.datalib.smap.SMAP()
s.set_credentials(username=username, password=password)

# This is the output Node of the Pipeline
diff = podpac.algorithm.Arithmetic(eqn='B-A', A=smap, B=smap_offset)
smap
Out[17]:
<SMAP()>
Create PODPAC Coordinates
This specifies the region and date where the pipeline will be evaluated
In [19]:
# Specify region of interest on a uniform grid
lat = podpac.crange(  60,  10, -2.0)  # (start, stop, step)
lon = podpac.crange(-130, -60,  2.0)  # (start, stop, step)

# Specify date and time
time = '2018-05-19T12:00:00'

# Create the PODPAC Coordinates
coords = podpac.Coordinates([lat, lon, time], dims=['lat', 'lon', 'time'])
Evaluating node on AWS cloud
In [20]:
# node = podpac.datalib.smap.SMAP()
# node.set_credentials(username="dsully10", password="Dynasty8735")
# podpac.settings["username@EGI"] = username
o = diff.eval(coords)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-20-c29bf5968522> in <module>
      2 # node.set_credentials(username="dsully10", password="Dynasty8735")
      3 # podpac.settings["username@EGI"] = username
----> 4 o = diff.eval(coords)

~/miniconda3/envs/podpac/lib/python3.7/site-packages/podpac/core/node.py in wrapper(self, coordinates, output)
    982             self._from_cache = True
    983         else:
--> 984             data = fn(self, coordinates, output=output)
    985             if self.cache_output:
    986                 self.put_cache(data, key, cache_coordinates)

~/miniconda3/envs/podpac/lib/python3.7/site-packages/podpac/core/algorithm/algorithm.py in eval(self, coordinates, output)
    125             # Evaluate nodes in serial
    126             for key, node in self.inputs.items():
--> 127                 inputs[key] = node.eval(coordinates)
    128             self._multi_threaded = False
    129 

~/miniconda3/envs/podpac/lib/python3.7/site-packages/podpac/core/node.py in wrapper(self, coordinates, output)
    982             self._from_cache = True
    983         else:
--> 984             data = fn(self, coordinates, output=output)
    985             if self.cache_output:
    986                 self.put_cache(data, key, cache_coordinates)

~/miniconda3/envs/podpac/lib/python3.7/site-packages/podpac/datalib/egi.py in eval(self, coordinates, output)
    219         zip_files = self._download(coordinates)
    220         try:
--> 221             self.data = self._read_zips(zip_files)  # reads each file in zip archive and creates single dataarray
    222         except KeyError as e:
    223             print("This following error may occur if data_key, lat_key, or lon_key is not correct.")

~/miniconda3/envs/podpac/lib/python3.7/site-packages/podpac/datalib/egi.py in _read_zips(self, zip_files)
    455                         all_data = uda.isel(lon=np.isfinite(uda.lon), lat=np.isfinite(uda.lat))
    456                     else:
--> 457                         all_data = self.append_file(all_data, uda)
    458                 else:
    459                     _log.warning("No data returned from file: {}".format(name))

~/miniconda3/envs/podpac/lib/python3.7/site-packages/podpac/datalib/smap_egi.py in append_file(self, all_data, data)
    243         """
    244         if all_data.shape[1:] == data.shape[1:]:
--> 245             data.lat.data = all_data.lat.data
    246             data.lon.data = all_data.lon.data
    247         else:

~/.local/lib/python3.7/site-packages/xarray/core/common.py in __setattr__(self, name, value)
    260         """
    261         try:
--> 262             object.__setattr__(self, name, value)
    263         except AttributeError as e:
    264             # Don't accidentally shadow custom AttributeErrors, e.g.

~/.local/lib/python3.7/site-packages/xarray/core/dataarray.py in data(self, value)
    551     @data.setter
    552     def data(self, value: Any) -> None:
--> 553         self.variable.data = value
    554 
    555     @property

~/.local/lib/python3.7/site-packages/xarray/core/variable.py in data(self, data)
   2106     def data(self, data):
   2107         raise ValueError(
-> 2108             f"Cannot assign to the .data attribute of dimension coordinate a.k.a IndexVariable {self.name!r}. "
   2109             f"Please use DataArray.assign_coords, Dataset.assign_coords or Dataset.assign as appropriate."
   2110         )

ValueError: Cannot assign to the .data attribute of dimension coordinate a.k.a IndexVariable 'lat'. Please use DataArray.assign_coords, Dataset.assign_coords or Dataset.assign as appropriate.
dsully-dev commented 4 years ago

The issue above was resolved in a recent commit to this release branch. We also found an issue with Lambda zips, which needed numexpr. I fixed and tested that.

mpu-creare commented 4 years ago

Thanks for the help @dsullivan-creare . This is good to go.