ESIPFed / gsoc

Project ideas and mentor guidance for ESIP members to participate in Google Summer of Code.
Apache License 2.0
34 stars 16 forks source link

Developing methods and libraries to ingest in situ data into Google Earth Engine #15

Closed esproles closed 5 years ago

esproles commented 5 years ago

Montana State University

Mentors Dr. Eric Sproles; Department of Earth Sciences, Montana State University Dr. Sean Yaw; Department of Computer Science, Montana State University

Information for Students Please see ESIP’s general guidelines.

Project Ideas Developing methods and libraries to ingest in situ data into Google Earth Engine

Abstract SnowCloud, an end-to-end cloud-computing framework comprised of (i) SnowCloudMetrics, cloud-based tools that transition RSS products into actionable snow metrics; and (ii) SnowCloudHydro, a simple hydrologic model for snow dominated watersheds that relies solely on monthly Snow Cover Frequency (SCF) and previous streamflow to forecast monthly streamflow with a one-month lead-time. The ability the current iteration of SnowCloudHydro to effectively predict streamflow is limited to watersheds that snow dependent and with limited liquid precipitation. Advancing the predictive capacity of SnowCloudHydro will require new space borne and in situ (field) data. Google Earth Engine (GEE) readily serves space-borne data in a cloud-based environment. However, linking these satellite data in GEE with in situ measurements remains a challenge yet unsolved.

Technical Details We propose for the Summer Google of Code student to develop a library or routine that can ingest point-based measurements into Google Earth Engine. The data sets that would be of most value are stream gage data from the United States Geological Survey (USGS), though we are equally interested in stream gages from other regions of the world.

A successful project would be characterized by: A. Code that is readily deployable in GEE that will read USGS (or equivalent agency) data into the SnowCloudHydro framework. B. Code that can automatically be incorporated into SnowCloudHydro predictions of streamflow.

Helpful Experience The student should have a solid understanding of hydroclimatic and spatial processes, both from an applied and computational perspective. Applied skills can be satisfied by coursework, though hands on field work is preferred, as is experience in snowy climates. Helpful skillsets in the spatial sciences include raster analysis and visualization. Experience using Google Earth Engine is preferred, but no required. Computational skills that would be most valuable are data assimilation, model calibration and validation, and the ability to code in Python and/or JavaScript.

First steps Read the paper that describes the SnowCloudHydro modelling framework (https://www.mdpi.com/2072-4292/10/8/1276/htm) to familiarize yourself with the model structure. You should also familiarize yourself with the SMAP, TRMM, and GPM datasets available on Google Earth Engine (https://developers.google.com/earth-engine/datasets/).

You should then gain access to Google Earth Engine (https://signup.earthengine.google.com/), and complete any tutorials needed to bring you up to speed. Once you are familiar with Google Earth Engine download and experiment with the SnowHydro code that drives snow cover analysis (https://github.com/MountainHydroClimate/SCF_ESIP/)Version6. Play with the code, get creative, and have fun.

Next, please familiarize yourself with the United States Geological Survey’s (USGS) National Water Information Portal (NWIS), and then learn about how to read-in USGS data (https://waterdata.usgs.gov/nwis).

Finally, get creative and experiment with getting real time USGS data into Google Earth Engine.

ritwikagarwal commented 5 years ago

Hey @esproles how I could I reach mentors in this project?

mmlella commented 5 years ago

I have gone through the links provided to get familiar with the project, but the github link for SnowHydro code does not work.

ritwikagarwal commented 5 years ago

@esproles mailed you :)

esproles commented 5 years ago

I have gone through the links provided to get familiar with the project, but the github link for SnowHydro code does not work.

https://github.com/MountainHydroClimate/SCF_ESIP/

go for version 6

esproles commented 5 years ago

Hey @esproles how I could I reach mentors in this project?

The best way to contact us is through this GitHub page.

ritwikagarwal commented 5 years ago

Oh ok,so actually I have gone through the documents and looked at the code , Should I start looking at drafting the proposal or do you propose to do something before hand. Also how do I send my proposal to you after drafting the first version.

esip-lab commented 5 years ago

@esproles any information you could provide @ritwikagarwal on ways to get started?

esproles commented 5 years ago

@ritwikagarwal Sorry for the delayed response.

I suggest:

1) providing example code that demonstrates a near-real time feed of USGS stream gage into a web-based interface, preferably Data Studio or Sheets. Fusion Tables are not a great option as they will be deprecated in Dec 2019.

2) providing an example of how to transition the results from the GEE code into Data Studio using Big Query GIS.

two sites that might help are: https://cloud.google.com/blog/products/gcp/bridging-the-gap-between-data-and-insights

http://whrc.org/wp-content/uploads/2016/02/Thau_Google.pdf

rpalomaki commented 5 years ago

Looks like a fun project! Are there any plans to also incorporate precipitation data (i.e. rain gauges) into the model, or will the focus over the summer be primarily on the stream gages?

esproles commented 5 years ago

@rpalomaki - Thanks for your interest - yes the goal will be to eventually incorporate other in situ data. The USGS is simply a preliminary step in the process. Ideally this would happen over the summer.

rpalomaki commented 5 years ago

Hi @esproles - I wrote a small package to extract USGS/NWIS stream gage data for export into Google Sheets. It's written in python, so not yet deployable into the JS SnowCloudHydro model. But, that is one thing I would hope to accomplish over the summer if I have the chance to work on this project.

Currently, the data extractor handles only the most recent values posted by the USGS. I am working now to handle data requests over longer periods, which I think would be a useful feature for the end-users of the model. Please let me know if you have any feedback. Thanks!

esproles commented 5 years ago

Hi @rpalomaki - Thanks for your efforts. This is a great start to getting the USGS data into GEE. I am not sure that it would need to be converted into JS, as you could make a call from GEE.

Yes you are correct - the model would need both longterm and recent data. This would be a logical next step.

My second question is does this package grab daily data? The SnowCloudHydro Model uses monthly averages, though we want to test it for weekly data as well. Could you specify the temporal bin (hourly, daily, weekly, monthly) in the package? While you could do the calculations in Google Sheets, the end goal is to make the SnowCloudHydro model as seamless as possible for the end user.

Thanks again for your efforts - esproles

rpalomaki commented 5 years ago

@esproles Currently the package only retrieves the most recent instantaneous value for a given variable and station. I believe this this is typically an hourly value, but could be daily depending on the station. But when I add the functionality to retrieve a longer period of data, it should be no problem to also add in temporal averaging and other statistics. This would give the user the ability to access and export e.g. a year's worth of weekly averages of discharge at multiple sites.

Regarding python and javascript - I am somewhat new to GEE and the web-based IDE, but I was under the impression that the IDE is set up only for JS. I've come across the GEE Python API, but it looks like this is installed/run on a local machine instead of the IDE. Have you had any luck in the past with running python scripts using the web-based IDE? Perhaps this is a non-issue if the NWIS data extractor can be a standalone package. But if the goal is to nest the data extractor inside the larger SCF model, this may be a hurdle to get over.

esproles commented 5 years ago

@rpalomaki - thanks for your response. Moving the package towards providing a temporal average of data (weekly, bi-monthly, monthly) will be a big step in creating a seamless framework for SnowCloudHydro.

Yes - hopefully the NWIS data extractor could be a stand alone package, but could the model incorporate other in situ data such as precip or SWE? These data would also come from sources other than USGS.

rpalomaki commented 5 years ago

@esproles - Yes, I think the current package could be expanded to incorporate other data. Do you have a particular dataset in mind for precip and SWE? There may be some differences in the data portals provided by different agencies.

esproles commented 5 years ago

@rpalomaki - for precip and SWE data SNOTEL would provide a single digital stop for data. Even though these are limited to the western US, these data would be a great place to start.

esip-lab commented 5 years ago

Hi all - a friendly reminder that there is ONE WEEK LEFT to submit your proposals for this project! Best of luck and we're excited to see what is submitted!

rpalomaki commented 5 years ago

@esproles - Great, I'll start looking into it!

rpalomaki commented 5 years ago

Hi @abburgess - I submitted a draft proposal a few days ago. Do you anticipate that I will be able to receive some feedback before the final versions are due?

esip-lab commented 5 years ago

@esproles can you please make sure to review the draft proposal submitted by @rpalomaki to make sure he is on the right track and the proposal is as successful as possible?

esproles commented 5 years ago

@abburgess @rpalomaki -

I read the draft proposal, and it looks very solid. Great job.

@rpalomaki - did you submit it to the GSoC website or here on GitHub?

Thanks!