bopen / c3s-eqc-toolbox-template

CADS Toolbox template application
Apache License 2.0
5 stars 4 forks source link

carra single-level trends and variability #78

Closed tdcwilliams closed 1 year ago

tdcwilliams commented 1 year ago

Notebook description

Notebook link or upload

CARRA_SL_Reanalysis_N.zip

Anything else we need to know?

Environment

malmans2 commented 1 year ago

Hi there,

Sorry about the delay, I was on leave. A couple of questions:

  1. You are using time weighted reductions, which are computationally more expensive than unweighted reductions. I think you don't need weights as you are dealing with daily data. Can we use unweighted reductions? (I.e., time_weighted_mean(obj, weights=False)
  2. I looked at the documentation of your dataset and it says that Lambert Conformal projections are used. Can we use cartopy Lambert Conformal or do you prefer to just use imshow?
malmans2 commented 1 year ago

Hi @tdcwilliams,

The template is ready. It implements the features in my previous comment, let me know if they are not correct and you want to change them. A couple of comments:

  1. The VM is struggling to process datasets of this size, especially when it's busy. I suggest to use monthly chunks (chunks={"year": 1, "month": 1}). I also added a rechunking step using a tmp zarr store, looks like it's a good workaround to avoid memory issues.
  2. We are not sure yet if the CIM will support interactive plots, which is why I'm using matplotlib for timeseries plots.

If you need me to pre-populate the cache with CDS data, please let me know the variables, regions and time periods you need. I usually run scripts overnight and during weekends that make concurrent requests to the CDS.

Here is the template: https://github.com/bopen/c3s-eqc-toolbox-template/blob/main/notebooks/wp5/carra_single_level_reanalysis.ipynb

Here is the notebook executed: https://gist.github.com/malmans2/6e986f080ecb6e6c96238ef8c044a2af

tdcwilliams commented 1 year ago

Hi @malmans2, thanks that notebook worked well! Ciao, Tim

tdcwilliams commented 1 year ago

Hi @malmans2, I haven't been able to run the notebooks for the full time period of CARRA (1991-2020) - it fails in the downloading part, would you be able to run them for us? I've attached 6 notebooks for the different variables (t2m, msl, total precip) and domains (east or west) in a zip file

Could you also make one notebook showing us how to add monthly maps in a memory-friendly way (somehow change the compute_time_weighted_stats function - we can then make them for all variables and domains)?

tdcwilliams commented 1 year ago

Here are the notebooks: carra_SL.zip

malmans2 commented 1 year ago

Hi @tdcwilliams,

Sure! I'll let you know when it's ready. Unlikely I'll be able to cache everything before next week, I'm running a couple of heavy notebooks for other evaluators.

What do you mean with monthly maps?

  1. 12 maps, so the overall seasonal cycle using all years available
  2. 12 * n_years maps, so the annual cycle of each year
tdcwilliams commented 1 year ago

Hi @malmans2, I meant the 1st option, but with mean, std, and linear_trend so 12 x 3 maps ie repeating what is done there for the total dataset month-by-month.

malmans2 commented 1 year ago

Got it!

malmans2 commented 1 year ago

Hi @tdcwilliams,

I revised a bit the notebook and looks like it's much more stable now. I was struggling to keep the memory under control on the VM, so there's an intermediate step where we rechunk and write to zarr.

2m_temperature for the west_domain is cached. Could you please try the new template and let me know if everything is OK? Please don't try the other variables/domain yet as I'm running the script to cache all of them. I'll let you know when all notebooks are ready.

Here is the template: https://github.com/bopen/c3s-eqc-toolbox-template/blob/main/notebooks/wp5/carra_single_level_reanalysis.ipynb Here is the notebook executed: https://gist.github.com/malmans2/6e986f080ecb6e6c96238ef8c044a2af

tdcwilliams commented 1 year ago

Hi @malmans2 I was able to run it fine and the results look great - thanks! Cheers, Tim

tdcwilliams commented 1 year ago

Hi @malmans2 actually I should double check the template with a colleague... will get back to you Tim

malmans2 commented 1 year ago

Sounds good, I'm making a couple of minor changes just to generalise for forecast data. I'll try to cache everything tonight anyways as I'll be on leave next week.

tdcwilliams commented 1 year ago

Great, thanks, and have a good holiday

On Fri, 25 Aug 2023 at 13:18, Mattia Almansi @.***> wrote:

Sounds good, I'm making a couple of minor changes just to generalise for forecast data. I'll try to cache everything tonight anyways as I'll be on leave next week.

— Reply to this email directly, view it on GitHub https://github.com/bopen/c3s-eqc-toolbox-template/issues/78#issuecomment-1693200534, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATVYQIHLP5SGI6DOTK5NDDXXCCZVANCNFSM6AAAAAAZ6FCNKA . You are receiving this because you were mentioned.Message ID: @.***>

malmans2 commented 1 year ago

Hi @tdcwilliams,

All variables requested are cached for both east/west for the time period 1991-2020. From now on, use the latest template as it supports both forecast and analysis: https://github.com/bopen/c3s-eqc-toolbox-template/blob/main/notebooks/wp5/carra_single_level_reanalysis.ipynb

Let me know if everything works OK.

tdcwilliams commented 1 year ago

Thanks @malmans2, that seems to run OK- I'll try out all the variables and domains

tdcwilliams commented 1 year ago

Hi @malmans2, we might need to do something different to show the variability in the total precipitation - you can see the problem with the time series produced

carra_single_level_reanalysis_precip_east.ipynb.zip

Is it possible to get percentiles out? I guess 33% - 66% is the equivalent of mean +/- std?

malmans2 commented 1 year ago

OK @tdcwilliams, we can do that. Do you want to do it for all other variables as well, or just for total precipitation? Also, do you want to show mean or median with quantiles?

tdcwilliams commented 1 year ago

hi @malmans2 just total precipitation thanks - the others should be fine with the usual stats. I guess median is better than mean as well. Thanks a lot

malmans2 commented 1 year ago

OK, I'm caching the following quantiles for the timeseries: 1/3, 1/2, 2/3.

You want to show mean and std maps, right?

tdcwilliams commented 1 year ago

sounds good. Yes I think the maps should be fines as mean and std

malmans2 commented 1 year ago

I've updated the template. Here is the notebook executed with total_precipitation and west_domain: https://gist.github.com/malmans2/6e986f080ecb6e6c96238ef8c044a2af

I'm now caching the east_domain. I'll let you know when it's ready.

tdcwilliams commented 1 year ago

thanks @malmans2, that looks better. Is the diagnostics module updated on the virtual machine?

malmans2 commented 1 year ago

yes it is!

malmans2 commented 1 year ago

(you need to restart the kernel though)

malmans2 commented 1 year ago

The east domain is also cached now.

tdcwilliams commented 1 year ago

great - I'll try it out

tdcwilliams commented 1 year ago

Hi @malmans2, it seems the other variables (t2m and msl) aren't cached anymore though. Could you run them or could I just try to change the cells you changed manually without running them?

malmans2 commented 1 year ago

They should be cached, let me check!

malmans2 commented 1 year ago

@tdcwilliams Which combination of parameters you don't find cached? I tried this and it was cached:

# Time
start = "1991-01"
stop = "2020-12"

# Region
domain = "west_domain"
assert domain in ("east_domain", "west_domain")

# Product type
product_type = "analysis"
assert product_type in ("analysis", "forecast")

# Variable
variable = "2m_temperature"
tdcwilliams commented 1 year ago

I tried msl but not t2m - I'll check t2m (I just assumed both had disappeared)

malmans2 commented 1 year ago

Weird, I tried msl and it's cached for me on WP5:

# Time
start = "1991-01"
stop = "2020-12"

# Region
domain = "west_domain"
assert domain in ("east_domain", "west_domain")

# Product type
product_type = "analysis"
assert product_type in ("analysis", "forecast")

# Variable
variable = "mean_sea_level_pressure"
malmans2 commented 1 year ago

Maybe you are logged as WP4? All wp have separate caches.

tdcwilliams commented 1 year ago

I forgot to change product_type back to analysis so it was downloading the forecast variables - now it is much quicker!

malmans2 commented 1 year ago

Nice! Let me know if we can close this issue.

tdcwilliams commented 1 year ago

Hi @malmans2, I guess we can close it. Thanks very much for your help