CIROH-UA / NGIAB-CloudInfra

NextGen In A Box: NextGen Generation Water Modeling Framework for Community Release (Docker version)
https://docs.ciroh.org/docs/products/nextgeninaboxDocker/
11 stars 17 forks source link

Google Big Query - Proof of Concept - BYU #28

Closed njones61 closed 1 month ago

njones61 commented 11 months ago

1. Requester Information: This should include the name and contact information of the person making the request.

Norm Jones Brigham Young University njones@byu.edu 801-422-7569

2. Project Information: Provide a brief description of the project and its goals. This can help the infrastructure team understand the context and purpose of the requested resources. Please highlight how this project will be benefit from and/or provide benefit to other resources on the shared infrastructure.

We are working on this project:

CIROH: Advancing Science to Better Characterize Drought and Groundwater-Driven Low-Flow Conditions in NOAA and USGS National-Scale Models

While most of the CIROH research is focused on dealing with extreme precipitation and flood events, low flow conditions resulting from extended dry periods can impact critical operations such as municipal water supply. The focus of our project, which is jointly funded by the USGS and NOAA, is to develop machine learning tools for accurately predicting low flow conditions in US streams. 

3. Project Description: If your project involves developing software or scripts, briefly describe the software you plan to develop.

We are developing a Python library with a suite of tools for digitally filtering baseflow from hydrographs at stream gages. We would to leverage Google Big Query which is now hosting both USGS streamflow gage data and NWM streamflow forecasts, resulting in over 100 billion records. We will analyze these data to identify periods in the streamflow records corresponding to baseflow-only conditions and then develop machine learning methods that use a combination of remote sensing and groundwater monitoring well data as input to predict baseflow at these periods. Our long-term goal is to develop baseflow prediction tools that could be integrated with the NexGen model.

4. Resource Requirements: Specify the compute, storage, and network resources needed for the project. Be as specific as possible about the number of resources required, and any specific configurations or capabilities needed. This information will help the infrastructure team determine the appropriate resources to allocate.

We have participated on a Google Big Query workshop taught by Kel Markert. In that workshop, he demonstrated how the platform is hosting US stream, Gage, and national water model retrospective forecast data. He showed how to use python code to perform operations on the data and how to run SQL queries on the data. Initially, we will want to run code that processes the streamflow data to labeled baseflow only as described above. We also want to import groundwater level data from the USGS, and then compare baseflow trends at stream gauges to water level trends in nearby monitoring wells to find correlations and see if groundwater trends are impacting, baseflow nationwide.

Options:

  1. Cloud Provider: AWS/Azure/GCP

GCP

  1. Required Services in the Cloud:

    List of GCP Services

    • Google Compute Engine
    • Google Kubernetes Engine (GKE)
    • Google Cloud Storage
    • Google VPC
    • Google IAM
    • Google BigQuery
    • Google Cloud Functions
    • Dataflow
    • Other: please list

Google Big Query for sure. We could use some help in determining if some of these other resources are needed.

5. Timeline: Indicate the expected timeline for the project and when the resources will be needed. This information can help the infrastructure team plan and allocate resources accordingly.

We would like to test and prototype this over the next 1-2 months.

6. Security and Compliance Requirements: If there are any specific security or compliance requirements for the project, these should be clearly stated in the request. This will help ensure that the necessary security measures are in place for the project.

No.

7. Estimation: Include any cost estimation or requirements for the project. This will help the infrastructure team select the most cost-effective solutions for the project.

???

Could use some guidance here.

AWS Cost Calculator: https://calculator.aws/#/

Google Cloud Pricing Calculator: https://cloud.google.com/products/calculator

8. Approval: Indicate the necessary approval processes or sign-offs required for the request.

arpita0911patel commented 11 months ago

Thank you for submitting this infrastructure request ticket. Can I please have the email addresses of everyone on the BYU team who needs access to BigQuery?

njones61 commented 11 months ago

Do you need google account emails @.***) or would their BYU emails work?

Norm

On Oct 10, 2023, at 8:06 AM, Arpita Patel @.***> wrote:

Thank you for submitting this infrastructure request ticket. Can I please have the email addresses of everyone on the BYU team who needs access to BigQuery?

— Reply to this email directly, view it on GitHub https://github.com/CIROH-UA/NGIAB-CloudInfra/issues/28#issuecomment-1755499795, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACANGPJEWM7FDW65WM6RU4LX6VI5TAVCNFSM6AAAAAA5WLF7HWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONJVGQ4TSNZZGU. You are receiving this because you authored the thread.

arpita0911patel commented 11 months ago

Let's try using BYU emails first if that works, that would be better.

njones61 commented 11 months ago

I didn’t get a response to my question, please use this list:

Norm Jones @. @.> @. @.>

Gus Williams @. @.> @. @.>

Amin Aghababaei @. @.> @. @.>

Xueyi Li @. @.> @. @.>

Donna Rizzo @. @.>

Ryan van der Heijden @. @.>

On Oct 10, 2023, at 8:06 AM, Arpita Patel @.***> wrote:

Thank you for submitting this infrastructure request ticket. Can I please have the email addresses of everyone on the BYU team who needs access to BigQuery?

— Reply to this email directly, view it on GitHub https://github.com/CIROH-UA/NGIAB-CloudInfra/issues/28#issuecomment-1755499795, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACANGPJEWM7FDW65WM6RU4LX6VI5TAVCNFSM6AAAAAA5WLF7HWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONJVGQ4TSNZZGU. You are receiving this because you authored the thread.

arpita0911patel commented 11 months ago

We are working on upgrading the 2i2c Jupyterhub Image to install BigQuery package so it's available to the users and they do not have to install it manually. Once it is ready will grant access to above email ids.

benlee0423 commented 11 months ago

@njones61

I can help you on this.

First, byu email accounts are associated with Google? We need either google account ( gmail) or byu account associated with google. This is required to be able to login in terminal and get authorized to access bigquery. Once you have that setup, I will give you the remaining steps.

benlee0423 commented 11 months ago

Here is instructions after email accounts are associated google.

  1. Start Jupyterhub server and open a terminal
  2. Do it as an optional if not done already.
    python3 -m virualenv <your-env> 
    source <your-env>/bin/activate
    <your-env>/bin/pip install google-cloud-bigquery
  3. Login Google cloud with the command. In this step, you will need to copy and paste a token to the terminal.
    gcloud auth application-default login --no-launch-browser
  4. Execute python code
    python3 example.py

I have included sample python script to connect bigquery.

from google.cloud import bigquery

client = bigquery.Client()

# Perform a query.
QUERY = (
    'SELECT feature_id FROM `awi-ciroh.nwm_ds.bq_table_short_range` '
    'LIMIT 100')
query_job = client.query(QUERY)  # API request
rows = query_job.result()  # Waits for query to finish

for row in rows:
    print(row.feature_id)
njones61 commented 11 months ago

Please just go ahead and register everyone using the gmail addresses listed above. Sorry for the slow response.

arpita0911patel commented 11 months ago

Thank you Norm we will add the gmail accounts for BYU and for UVM accounts they would either have to provide their gmail accounts or create google account associated with UVM accounts.

benlee0423 commented 11 months ago

@njones61 Added all emails except the last UVM email account.

arpita0911patel commented 5 months ago

@njones61 - please let us know if you were able to use the accounts on your end.

arpita0911patel commented 1 month ago

@njones61 - We have CIROH Google Cloud account setup now, so please let us know if you would like to migrate to that account and use that for your project. Thank you!

arpita0911patel commented 1 month ago

Closing this after checking with Norm Jones.