cal-adapt / climakitae

A Python toolkit for retrieving, visualizing, and performing scientific analyses with data from the Cal-Adapt Analytics Engine.
https://climakitae.readthedocs.io
BSD 3-Clause "New" or "Revised" License
19 stars 2 forks source link

Large Data Warnings #335

Closed elehmer closed 4 months ago

elehmer commented 5 months ago

Description of PR

This PR adds data warnings for when users pick large amounts of data that will take long periods to run operations.

Summary of changes and related issue

Adding data warnings for data retrieved over 1GB in size. Data over 5GB given more emphatic message. Data over 10GB given a sever warning.

Relevant motivation and context

User can download very large datasets without knowing the consequences of doing so as far as run time on even the simplest of operations. As warning before retrieval is not ideal due to different paths to access the data it seemed warranted to put a warning after data retrieval but before trying to load into memory.

Dependencies required for this change?

None

Type of change

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce.

Checklist:

elehmer commented 4 months ago

Can't download big file

elehmer commented 4 months ago

Data Warning

elehmer commented 4 months ago

Tested in getting_started on:

  • hourly air temp
  • all of CA
  • not area averaged
  • 3km
  • historical + ssp370 Which rightfully returns the "huge" option. My preference is to modify the warning text in this case that it is prohibitively large and that operations may not work without further subsetting.

Otherwise, is there a way we can be more explicit? "Extra time" and "considerably more time" is somewhat vague

I made some ballpark guesses compared 1GB of data using ck.load times