Closed mpu-creare closed 4 years ago
I only scratched
around the scratch
folder -- gleaning a few things there that were useful.
When testing 3-processing/running-on-aws-lambda.ipynb
, I ran into some issues. I tried evaluation the node locally, and ran into more issues. In general, I've had a hard time getting PODPAC installed with all dependencies. Here is my attempt at evaluating that node:
%matplotlib inline
# Get the PODPAC logger
import logging
logger = logging.getLogger("podpac")
Setup PODPAC for AWS (1/2)
Configure AWS credentials
Can be specified at runtime, or in the podpac settings file
In [8]:
import podpac
from podpac import settings
# need to allow unsafe evaluation to use `podpac.algorithm.Arithmetic` Node (below)
settings.set_unsafe_eval(True)
# # Credentials
# settings["AWS_ACCESS_KEY_ID"] = "access key id"
# settings["AWS_SECRET_ACCESS_KEY"] = "secrect access key"
# settings["AWS_REGION_NAME"] = "region name"
# # General Settings
# settings["AWS_TAGS"] = {} # tags to assign to AWS resources created through PODPAC
# # S3
# settings["S3_BUCKET_NAME"] = "S3 bucket for Lambda functions or PODPAC cache"
# # Lambda
# settings["FUNCTION_NAME"] = "name of lambda function to eval"
# settings["FUNCTION_ROLE_NAME"] = "role name for lambda function"
# settings["FUNCTION_DEPENDENCIES_KEY"] = "path on S3 bucket where function depedencies live"
# settings["FUNCTION_S3_INPUT"] = "path on S3 bucket for input pipelines. Objects put in this directory will trigger lambda function",
# settings["FUNCTION_S3_OUTPUT"] = "path on S3 bucket for pipeline outputs. Objects put in this directory will be returned to lambda function",
# # Paths - overwrite paths for Lambda caching
# this will be fixed in future releases of PODPAC
settings["ROOT_PATH"] = "/tmp/"
settings["LOG_FILE_PATH"] = "/tmp/"
Provide Earth Data Login Credentials
If you do not have an earth data login, or have not activated OpenDAP access, follow the instructions here.
In [3]:
import getpass
username = password = None
username = input("Username:"); password = getpass.getpass('Password:')
# EarthData Credentials need to get saved in `settings` in order to get passed
# into the Lambda function on AWS. This will be automated in the future.
settings["username@urs.earthdata.nasa.gov"] = username
settings["password@urs.earthdata.nasa.gov"] = password
Setup (2/2)
Create the PODPAC Pipeline
We'll use the same pipeline from the 100-analyzing-SMAP-data.ipynb notebook
This example computes the difference between the current soil moisture for a region, and that of the previous year
In [17]:
import podpac.datalib
# Create the Pipsmap_offset
product = 'SPL4SMAU'
smap = podpac.datalib.smap_egi.SMAP(product=product, username=username, password=password)
smap_time1_offset = podpac.algorithm.ExpandCoordinates(source=smap, time=['-1,Y', '-1,Y', '1,Y'])
smap_offset = podpac.algorithm.Mean(source=smap_time1_offset, dims=['time'])
s = podpac.datalib.smap.SMAP()
s.set_credentials(username=username, password=password)
# This is the output Node of the Pipeline
diff = podpac.algorithm.Arithmetic(eqn='B-A', A=smap, B=smap_offset)
smap
Out[17]:
<SMAP()>
Create PODPAC Coordinates
This specifies the region and date where the pipeline will be evaluated
In [19]:
# Specify region of interest on a uniform grid
lat = podpac.crange( 60, 10, -2.0) # (start, stop, step)
lon = podpac.crange(-130, -60, 2.0) # (start, stop, step)
# Specify date and time
time = '2018-05-19T12:00:00'
# Create the PODPAC Coordinates
coords = podpac.Coordinates([lat, lon, time], dims=['lat', 'lon', 'time'])
Evaluating node on AWS cloud
In [20]:
# node = podpac.datalib.smap.SMAP()
# node.set_credentials(username="dsully10", password="Dynasty8735")
# podpac.settings["username@EGI"] = username
o = diff.eval(coords)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-20-c29bf5968522> in <module>
2 # node.set_credentials(username="dsully10", password="Dynasty8735")
3 # podpac.settings["username@EGI"] = username
----> 4 o = diff.eval(coords)
~/miniconda3/envs/podpac/lib/python3.7/site-packages/podpac/core/node.py in wrapper(self, coordinates, output)
982 self._from_cache = True
983 else:
--> 984 data = fn(self, coordinates, output=output)
985 if self.cache_output:
986 self.put_cache(data, key, cache_coordinates)
~/miniconda3/envs/podpac/lib/python3.7/site-packages/podpac/core/algorithm/algorithm.py in eval(self, coordinates, output)
125 # Evaluate nodes in serial
126 for key, node in self.inputs.items():
--> 127 inputs[key] = node.eval(coordinates)
128 self._multi_threaded = False
129
~/miniconda3/envs/podpac/lib/python3.7/site-packages/podpac/core/node.py in wrapper(self, coordinates, output)
982 self._from_cache = True
983 else:
--> 984 data = fn(self, coordinates, output=output)
985 if self.cache_output:
986 self.put_cache(data, key, cache_coordinates)
~/miniconda3/envs/podpac/lib/python3.7/site-packages/podpac/datalib/egi.py in eval(self, coordinates, output)
219 zip_files = self._download(coordinates)
220 try:
--> 221 self.data = self._read_zips(zip_files) # reads each file in zip archive and creates single dataarray
222 except KeyError as e:
223 print("This following error may occur if data_key, lat_key, or lon_key is not correct.")
~/miniconda3/envs/podpac/lib/python3.7/site-packages/podpac/datalib/egi.py in _read_zips(self, zip_files)
455 all_data = uda.isel(lon=np.isfinite(uda.lon), lat=np.isfinite(uda.lat))
456 else:
--> 457 all_data = self.append_file(all_data, uda)
458 else:
459 _log.warning("No data returned from file: {}".format(name))
~/miniconda3/envs/podpac/lib/python3.7/site-packages/podpac/datalib/smap_egi.py in append_file(self, all_data, data)
243 """
244 if all_data.shape[1:] == data.shape[1:]:
--> 245 data.lat.data = all_data.lat.data
246 data.lon.data = all_data.lon.data
247 else:
~/.local/lib/python3.7/site-packages/xarray/core/common.py in __setattr__(self, name, value)
260 """
261 try:
--> 262 object.__setattr__(self, name, value)
263 except AttributeError as e:
264 # Don't accidentally shadow custom AttributeErrors, e.g.
~/.local/lib/python3.7/site-packages/xarray/core/dataarray.py in data(self, value)
551 @data.setter
552 def data(self, value: Any) -> None:
--> 553 self.variable.data = value
554
555 @property
~/.local/lib/python3.7/site-packages/xarray/core/variable.py in data(self, data)
2106 def data(self, data):
2107 raise ValueError(
-> 2108 f"Cannot assign to the .data attribute of dimension coordinate a.k.a IndexVariable {self.name!r}. "
2109 f"Please use DataArray.assign_coords, Dataset.assign_coords or Dataset.assign as appropriate."
2110 )
ValueError: Cannot assign to the .data attribute of dimension coordinate a.k.a IndexVariable 'lat'. Please use DataArray.assign_coords, Dataset.assign_coords or Dataset.assign as appropriate.
The issue above was resolved in a recent commit to this release branch. We also found an issue with Lambda zips, which needed numexpr
. I fixed and tested that.
Thanks for the help @dsullivan-creare . This is good to go.
Description
NOTE: Please start testing on the develop branch until the release/2.0.0 branch is created..
We need to ensure that the example notebooks are up to date and run without error on the 2.0 release.
Please check a notebook (or two or three... etc.) and check it off when you confirmed it runs through without errors. Also make any modifications needed.
WARNING:
Check every link. Many of the links are BROKEN.
Progress
0-concepts/node.ipynb WORKS BUT NEEDS UPDATES5-datalib/netcdf.ipynb