CIROH-UA / NGIAB-CloudInfra

NextGen In A Box: NextGen Generation Water Modeling Framework for Community Release (Docker version)
https://docs.ciroh.org/docs/products/nextgeninaboxDocker/
11 stars 16 forks source link

Google Big Query - Baseflow Machine Learning Model - BYU #29

Open njones61 opened 9 months ago

njones61 commented 9 months ago

1. Requester Information: This should include the name and contact information of the person making the request.

Norm Jones Brigham Young University njones@byu.edu 801-422-7569

2. Project Information: Provide a brief description of the project and its goals. This can help the infrastructure team understand the context and purpose of the requested resources. Please highlight how this project will be benefit from and/or provide benefit to other resources on the shared infrastructure.

We are working on this project:

CIROH: Advancing Science to Better Characterize Drought and Groundwater-Driven Low-Flow Conditions in NOAA and USGS National-Scale Models

While most of the CIROH research is focused on dealing with extreme precipitation and flood events, low flow conditions resulting from extended dry periods can impact critical operations such as municipal water supply. The focus of our project, which is jointly funded by the USGS and NOAA, is to develop machine learning tools for accurately predicting low flow conditions in US streams. 

3. Project Description: If your project involves developing software or scripts, briefly describe the software you plan to develop.

We are developing a Python library with a suite of tools for digitally filtering baseflow from hydrographs at stream gages. We would to leverage Google Big Query which is now hosting both USGS streamflow gage data and NWM streamflow forecasts, resulting in over 100 billion records. We will analyze these data to identify periods in the streamflow records corresponding to baseflow-only conditions and then develop machine learning methods that use a combination of remote sensing and groundwater monitoring well data as input to predict baseflow at these periods. Our long-term goal is to develop baseflow prediction tools that could be integrated with the NexGen model.

4. Resource Requirements: Specify the compute, storage, and network resources needed for the project. Be as specific as possible about the number of resources required, and any specific configurations or capabilities needed. This information will help the infrastructure team determine the appropriate resources to allocate.

Following our prototype project, we would like to use:

USGS stream gage data NWM retrospective forecast data USGS groundwater level data Remote sensing data (GRACE grids, GLDAS grids, etc - in netCDF format)

we will label periods in the stream gage hydrographs that correspond to baseflow only (BFO) and then develop machine learning models that use well data and remote sensing data as features to predict baseflow flowrates (Q) at the BFO time periods. We would like to attempt this using the SQL and Google Colab interfaces to Google Big Query.

Options:

  1. Cloud Provider: AWS/Azure/GCP

GCP

  1. Required Services in the Cloud:

    List of GCP Services

    • Google Compute Engine
    • Google Kubernetes Engine (GKE)
    • Google Cloud Storage
    • Google VPC
    • Google IAM
    • Google BigQuery
    • Google Cloud Functions
    • Dataflow
    • Other: please list

Google Big Query for sure. We could use some help in determining if some of these other resources are needed.

5. Timeline: Indicate the expected timeline for the project and when the resources will be needed. This information can help the infrastructure team plan and allocate resources accordingly.

Starting in 1-2 months and then continuing over the course of our project (6/1/2023 - 6/1/2025)

6. Security and Compliance Requirements: If there are any specific security or compliance requirements for the project, these should be clearly stated in the request. This will help ensure that the necessary security measures are in place for the project.

No

7. Estimation: Include any cost estimation or requirements for the project. This will help the infrastructure team select the most cost-effective solutions for the project.

AWS Cost Calculator: https://calculator.aws/#/

Google Cloud Pricing Calculator: https://cloud.google.com/products/calculator

???

Could use some guidance here.

8. Approval: Indicate the necessary approval processes or sign-offs required for the request.

arpita0911patel commented 3 months ago

Access has been granted for https://github.com/CIROH-UA/NGIAB-CloudInfra/issues/28, and we are awaiting an update on that ticket before proceeding further with this one.