ContinuumIO / elm

Phase I & part of Phase II of NASA SBIR - Parallel Machine Learning on Satellite Data
http://ensemble-learning-models.readthedocs.io
44 stars 23 forks source link

Add notebook for NLDAS data exploration #183

Closed gbrener closed 7 years ago

gbrener commented 7 years ago

To aid with feature engineering in applying ML to NLDAS, I've created several a notebook with visualizations showing one or more publically-available GRIB files.

jbednar commented 7 years ago

I'm not sure what the underlying intent is here, but if you're just generating all these notebooks in order to run a similar set of steps on different data files, @jlstevens can show you how to do that using a single reference .ipynb file with a filename parameter widget, set up so that a continuous integration system can run that notebook for every file in a directory or every URL in a list.

gbrener commented 7 years ago

@jbednar The primary intent is data exploration for the purposes of feature engineering. I'll update the PR description to reflect this. I haven't analyzed the data enough to determine whether we should be treating every file type the same way - this was just a naive starting point. I'll get more information from @jlstevens about the parameter widget to see how it can help us here.

PeterDSteinberg commented 7 years ago

@gbrener I'll push up some changes in about an hour, then modify as needed. I'll just edit the one main notebook that gets copied currently.

gbrener commented 7 years ago

Just pushed up an improvement to the notebooks which makes visualizing a bit easier, and solves the problem of not knowing how many attributes the data files have ahead of time. Here is what the first visualizations now look like:

screen shot 2017-07-20 at 12 33 08 pm
gbrener commented 7 years ago

Adjusted the undefined data values, so the visualizations look more accurate now:

screen shot 2017-07-21 at 4 41 52 pm

These values should be ignored during analysis (see https://hydro1.gesdisc.eosdis.nasa.gov/data/NLDAS/NLDAS_FOR0125_H.001/doc/README.NLDAS1.pdf).

Pushing up changes in the next few minutes.

gbrener commented 7 years ago

Just pushed up the one-notebook approach, using ipywidgets via the functions in example_utils.py. Updated this PR's description to reflect the new approach. Here is a screencast of what the data file selection looks like: data_browser_screencap