bopen / c3s-eqc-toolbox-template

CADS Toolbox template application
Apache License 2.0
6 stars 4 forks source link

Ocean Color - Restarting Kernel #80

Closed chiaravol closed 1 year ago

chiaravol commented 1 year ago

Notebook description

Hi, I need to download and spatially average ocean color data stored in NetCDF files. Everything seemed to work fine, but when performing a spatial weighted mean over a longer dataset I have this error message "The kernel for Ocean_color_v6.ipynb appears to have died. It will restart automatically".

Notebook link or upload

http://localhost:5678/lab/tree/Ocean_color_v6.ipynb

Anything else we need to know?

I had to stringify netCDF files dates when updating the request.

Environment

<

affine 2.4.0 pyhd8ed1ab_0 conda-forge aiofiles 22.1.0 py310hecd8cb5_0
aiosqlite 0.18.0 py310hecd8cb5_0
anyio 3.5.0 py310hecd8cb5_0
appnope 0.1.2 py310hecd8cb5_1001
argon2-cffi 21.3.0 pyhd3eb1b0_0
argon2-cffi-bindings 21.2.0 py310hca72f7f_0
asttokens 2.0.5 pyhd3eb1b0_0
attrs 22.1.0 py310hecd8cb5_0
babel 2.11.0 py310hecd8cb5_0
backcall 0.2.0 pyhd3eb1b0_0
beautifulsoup4 4.12.2 py310hecd8cb5_0
bleach 4.1.0 pyhd3eb1b0_0
blosc 1.21.4 heccf04b_0 conda-forge boltons 23.0.0 py310hecd8cb5_0
boost-cpp 1.78.0 hf5ba120_3 conda-forge brotli 1.0.9 hb7f2c08_9 conda-forge brotli-bin 1.0.9 hb7f2c08_9 conda-forge brotlipy 0.7.0 py310hca72f7f_1002
bzip2 1.0.8 h1de35cc_0
c-ares 1.19.1 h0dc2134_0 conda-forge ca-certificates 2019.11.28 hecc5488_0 conda-forge/label/cf202003 cairo 1.16.0 h09dd18c_1016 conda-forge cdsapi 0.1.6 py_0 conda-forge/label/cf202003 certifi 2023.5.7 py310hecd8cb5_0
cffi 1.15.1 py310h6c40b1e_3
cfitsio 4.2.0 hd56cc12_0 conda-forge cftime 1.6.2 py310h936d966_1 conda-forge charset-normalizer 2.0.4 pyhd3eb1b0_0
click 8.1.3 unix_pyhd8ed1ab_2 conda-forge click-plugins 1.1.1 py_0 conda-forge cligj 0.7.2 pyhd8ed1ab_1 conda-forge comm 0.1.2 py310hecd8cb5_0
conda 23.5.0 py310h2ec42d9_1 conda-forge conda-content-trust 0.1.3 py310hecd8cb5_0
conda-package-handling 2.0.2 py310hecd8cb5_0
conda-package-streaming 0.7.0 py310hecd8cb5_0
contourpy 1.1.0 py310h88cfcbd_0 conda-forge cryptography 38.0.4 py310hdd0c95c_0 conda-forge curl 8.1.2 hbee3ae8_0 conda-forge cycler 0.11.0 pyhd8ed1ab_0 conda-forge debugpy 1.5.1 py310he9d5cce_0
decorator 5.1.1 pyhd3eb1b0_0
defusedxml 0.7.1 pyhd3eb1b0_0
entrypoints 0.4 py310hecd8cb5_0
executing 0.8.3 pyhd3eb1b0_0
expat 2.5.0 hf0c8a7f_1 conda-forge font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge font-ttf-inconsolata 3.000 h77eed37_0 conda-forge font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge font-ttf-ubuntu 0.83 hab24e00_0 conda-forge fontconfig 2.14.2 h5bb23bf_0 conda-forge fonts-conda-ecosystem 1 0 conda-forge fonts-conda-forge 1 0 conda-forge fonttools 4.40.0 py310h6729b98_0 conda-forge freetype 2.12.1 hd8bbffd_0
freexl 1.0.6 hb7f2c08_1 conda-forge geos 3.12.0 he965462_0 conda-forge geotiff 1.7.1 h5cf5d3c_9 conda-forge gettext 0.21.1 h8a4c099_0 conda-forge giflib 5.2.1 hb7f2c08_3 conda-forge hdf4 4.2.15 h9804679_6 conda-forge hdf5 1.14.1 nompi_hedada53_100 conda-forge icu 72.1 h7336db1_0 conda-forge idna 3.4 py310hecd8cb5_0
ipykernel 6.19.2 py310h20db666_0
ipython 8.12.0 py310hecd8cb5_0
ipython_genutils 0.2.0 pyhd3eb1b0_1
jedi 0.18.1 py310hecd8cb5_1
jinja2 3.1.2 py310hecd8cb5_0
json-c 0.16 h01d06f9_0 conda-forge json5 0.9.6 pyhd3eb1b0_0
jsonpatch 1.32 pyhd3eb1b0_0
jsonpointer 2.1 pyhd3eb1b0_0
jsonschema 4.17.3 py310hecd8cb5_0
jupyter_client 8.1.0 py310hecd8cb5_0
jupyter_core 5.3.0 py310hecd8cb5_0
jupyter_events 0.6.3 py310hecd8cb5_0
jupyter_server 2.5.0 py310hecd8cb5_0
jupyter_server_fileid 0.9.0 py310hecd8cb5_0
jupyter_server_terminals 0.4.4 py310hecd8cb5_1
jupyter_server_ydoc 0.8.0 py310hecd8cb5_1
jupyter_ydoc 0.2.4 py310hecd8cb5_0
jupyterlab 3.6.3 py310hecd8cb5_0
jupyterlab_pygments 0.1.2 py_0
jupyterlab_server 2.22.0 py310hecd8cb5_0
kealib 1.5.1 h7014c1b_4 conda-forge kiwisolver 1.4.4 py310ha23aa8a_1 conda-forge krb5 1.20.1 h049b76e_0 conda-forge lcms2 2.15 h2dcdeff_1 conda-forge lerc 4.0.0 hb486fe8_0 conda-forge libaec 1.0.6 hf0c8a7f_1 conda-forge libarchive 3.6.2 h0b5dc4a_1 conda-forge libblas 3.9.0 17_osx64_openblas conda-forge libbrotlicommon 1.0.9 hb7f2c08_9 conda-forge libbrotlidec 1.0.9 hb7f2c08_9 conda-forge libbrotlienc 1.0.9 hb7f2c08_9 conda-forge libcblas 3.9.0 17_osx64_openblas conda-forge libcurl 8.1.2 hbee3ae8_0 conda-forge libcxx 16.0.6 hd57cbcb_0 conda-forge libdeflate 1.18 hac1461d_0 conda-forge libedit 3.1.20221030 h6c40b1e_0
libev 4.33 haf1e3a3_1 conda-forge libexpat 2.5.0 hf0c8a7f_1 conda-forge libffi 3.4.2 hecd8cb5_6
libgdal 3.7.0 hc13fe4b_4 conda-forge libgfortran 5.0.0 11_3_0_h97931a8_31 conda-forge libgfortran5 12.2.0 he409387_31 conda-forge libglib 2.76.3 hc62aa5d_0 conda-forge libiconv 1.17 hac89ed1_0 conda-forge libjpeg-turbo 2.1.5.1 hb7f2c08_0 conda-forge libkml 1.3.0 haeb80ef_1015 conda-forge liblapack 3.9.0 17_osx64_openblas conda-forge libnetcdf 4.9.2 nompi_hfeda9e8_106 conda-forge libnghttp2 1.52.0 he2ab024_0 conda-forge libopenblas 0.3.23 openmp_h429af6e_0 conda-forge libpng 1.6.39 h6c40b1e_0
libpq 15.3 h9dc22bb_1 conda-forge librttopo 1.1.0 h23f359d_14 conda-forge libsodium 1.0.18 h1de35cc_0
libspatialite 5.0.1 h8e1b34b_28 conda-forge libsqlite 3.42.0 h58db7d2_0 conda-forge libssh2 1.11.0 hd019ec5_0 conda-forge libtiff 4.5.1 hf955e92_0 conda-forge libwebp-base 1.3.1 h0dc2134_0 conda-forge libxcb 1.15 hb7f2c08_0 conda-forge libxml2 2.11.4 hd95e348_0 conda-forge libxslt 1.1.37 h20bfa82_1 conda-forge libzip 1.9.2 h6db710c_1 conda-forge libzlib 1.2.13 h8a1eda9_5 conda-forge llvm-openmp 16.0.6 hff08bdf_0 conda-forge lxml 4.9.2 py310h479f746_1 conda-forge lz4-c 1.9.4 hf0c8a7f_0 conda-forge lzo 2.10 haf1e3a3_1000 conda-forge markupsafe 2.1.1 py310hca72f7f_0
matplotlib 3.7.1 py310h2ec42d9_0 conda-forge matplotlib-base 3.7.1 py310he725631_0 conda-forge matplotlib-inline 0.1.6 py310hecd8cb5_0
mistune 0.8.4 py310hca72f7f_1000
munkres 1.1.4 pyh9f0ad1d_0 conda-forge nbclassic 0.5.5 py310hecd8cb5_0
nbclient 0.5.13 py310hecd8cb5_0
nbconvert 6.5.4 py310hecd8cb5_0
nbformat 5.7.0 py310hecd8cb5_0
ncurses 6.4 hcec6c5f_0
nest-asyncio 1.5.6 py310hecd8cb5_0
netcdf4 1.6.4 nompi_py310h845552d_101 conda-forge notebook 6.5.4 py310hecd8cb5_0
notebook-shim 0.2.2 py310hecd8cb5_0
nspr 4.35 hea0b92c_0 conda-forge nss 3.89 h78b00b3_0 conda-forge numpy 1.25.0 py310h7451ae0_0 conda-forge openjpeg 2.5.0 h13ac156_2 conda-forge openssl 3.1.1 h8a1eda9_1 conda-forge packaging 23.0 py310hecd8cb5_0
pandas 2.0.3 py310h5e4fcda_0 conda-forge pandocfilters 1.5.0 pyhd3eb1b0_0
parso 0.8.3 pyhd3eb1b0_0
pcre2 10.40 h1c4e4bc_0 conda-forge pexpect 4.8.0 pyhd3eb1b0_3
pickleshare 0.7.5 pyhd3eb1b0_1003
pillow 10.0.0 py310hd63a8c7_0 conda-forge pip 23.0.1 py310hecd8cb5_0
pixman 0.40.0 hbcb3906_0 conda-forge platformdirs 2.5.2 py310hecd8cb5_0
pluggy 1.0.0 py310hecd8cb5_1
pooch 1.7.0 pyha770c72_3 conda-forge poppler 23.05.0 he041c3a_1 conda-forge poppler-data 0.4.12 hd8ed1ab_0 conda-forge postgresql 15.3 h325e403_1 conda-forge proj 9.2.1 hc8d59c9_0 conda-forge prometheus_client 0.14.1 py310hecd8cb5_0
prompt-toolkit 3.0.36 py310hecd8cb5_0
psutil 5.9.0 py310hca72f7f_0
pthread-stubs 0.4 hc929b4f_1001 conda-forge ptyprocess 0.7.0 pyhd3eb1b0_2
pure_eval 0.2.2 pyhd3eb1b0_0
pycosat 0.6.4 py310hca72f7f_0
pycparser 2.21 pyhd3eb1b0_0
pygments 2.15.1 py310hecd8cb5_1
pyopenssl 23.0.0 py310hecd8cb5_0
pyparsing 3.1.0 pyhd8ed1ab_0 conda-forge pyproj 3.6.0 py310h198f139_1 conda-forge pyrsistent 0.18.0 py310hca72f7f_0
pysocks 1.7.1 py310hecd8cb5_0
python 3.10.12 had23ca6_0_cpython conda-forge python-dateutil 2.8.2 pyhd3eb1b0_0
python-fastjsonschema 2.16.2 py310hecd8cb5_0
python-json-logger 2.0.7 py310hecd8cb5_0
python-tzdata 2023.3 pyhd8ed1ab_0 conda-forge python.app 3 py310hca72f7f_0
python_abi 3.10 2_cp310 conda-forge pytz 2022.7 py310hecd8cb5_0
pyyaml 6.0 py310h6c40b1e_1
pyzmq 25.1.0 py310hcec6c5f_0
rasterio 1.3.8 py310hd17acd7_0 conda-forge readline 8.2 hca72f7f_0
requests 2.28.1 py310hecd8cb5_1
rfc3339-validator 0.1.4 py310hecd8cb5_0
rfc3986-validator 0.1.1 py310hecd8cb5_0
rioxarray 0.14.1 pyhd8ed1ab_0 conda-forge ruamel.yaml 0.17.21 py310hca72f7f_0
ruamel.yaml.clib 0.2.6 py310hca72f7f_1
scipy 1.11.1 py310h3900cf1_0 conda-forge send2trash 1.8.0 pyhd3eb1b0_1
setuptools 65.6.3 py310hecd8cb5_0
six 1.16.0 pyhd3eb1b0_1
snappy 1.1.10 h225ccf5_0 conda-forge sniffio 1.2.0 py310hecd8cb5_1
snuggs 1.4.7 py_0 conda-forge soupsieve 2.4 py310hecd8cb5_0
sqlite 3.41.1 h6c40b1e_0
stack_data 0.2.0 pyhd3eb1b0_0
terminado 0.17.1 py310hecd8cb5_0
tiledb 2.13.2 h8b9cbf0_0 conda-forge tinycss2 1.2.1 py310hecd8cb5_0
tk 8.6.12 h5d9f67b_0
tomli 2.0.1 py310hecd8cb5_0
toolz 0.12.0 py310hecd8cb5_0
tornado 6.2 py310hca72f7f_0
tqdm 4.65.0 py310h20db666_0
traitlets 5.7.1 py310hecd8cb5_0
typing-extensions 4.6.3 py310hecd8cb5_0
typing_extensions 4.6.3 py310hecd8cb5_0
tzcode 2023c hb7f2c08_0 conda-forge tzdata 2023c h04d1e81_0
unicodedata2 15.0.0 py310h90acd4f_0 conda-forge urllib3 1.26.15 py310hecd8cb5_0
wcwidth 0.2.5 pyhd3eb1b0_0
webencodings 0.5.1 py310hecd8cb5_1
websocket-client 0.58.0 py310hecd8cb5_4
wheel 0.38.4 py310hecd8cb5_0
xarray 2023.6.0 pyhd8ed1ab_0 conda-forge xerces-c 3.2.4 h90c7484_2 conda-forge xorg-libxau 1.0.11 h0dc2134_0 conda-forge xorg-libxdmcp 1.1.3 h35c211d_0 conda-forge xz 5.2.10 h6c40b1e_1
y-py 0.5.9 py310h7242b5c_0
yaml 0.2.5 haf1e3a3_0
ypy-websocket 0.8.2 py310hecd8cb5_0
zeromq 4.3.4 h23ab428_0
zlib 1.2.13 h8a1eda9_5 conda-forge zstandard 0.19.0 py310h6c40b1e_0
zstd 1.5.5 hc035e20_0

malmans2 commented 1 year ago

Hi @chiaravol,

Please upload here either the notebook or a snippet with the code. You pasted the url of your ssh tunnel.

chiaravol commented 1 year ago

Hi,

Ops, sorry…I’m pretty new at Python and this is my very first Jupyter Notebook.

Here the notebook.

Thanks, Chiara

On Jul 12, 2023, at 10:57, Mattia Almansi @.**@.>> wrote:

Hi @chiaravolhttps://github.com/chiaravol,

Please upload here either the notebook or a snippet with the code. You pasted the url of your ssh tunnel.

— Reply to this email directly, view it on GitHubhttps://github.com/bopen/c3s-eqc-toolbox-template/issues/80#issuecomment-1632118229, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BBGJWKRZPG34J5GJ3L745C3XPZRHNANCNFSM6AAAAAA2HE55PY. You are receiving this because you were mentioned.Message ID: @.***>


Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle persone indicate e la casella di posta elettron ica da cui è stata inviata è da qualificarsi quale strumento aziendale.

La diffusione, copia o qualsiasi altra azione derivante dalla conoscenza di queste informazioni sono rigorosamente viet ate (art. 616 c.p, D.Lgs. n. 196/2003 s.m.i. e GDPR Regolamento - UE 2016/679).

Qualora abbiate ricevuto questo documento per errore siete cortesemente pregati di darne immediata comunicazione al mit tente e di provvedere alla sua distruzione. Grazie.

This e-mail and any attachments is confidential and may contain privileged information intended for the addressee(s) on ly.

Dissemination, copying, printing or use by anybody else is unauthorised (art. 616 c.p, D.Lgs. n. 196/2003 and subsequen t amendments and GDPR UE 2016/679).

If you are not the intended recipient, please delete this message and any attachments and advise the sender by return e -mail. Thanks.


malmans2 commented 1 year ago

Hi Chiara,

No worries!

I didn't get it yet. If you reply to GitHub issues via email, we don't get any attachment. In general, I suggest to reply via GitHub as emails are not very well formatted.

chiaravol commented 1 year ago

Ocean_color_v6.ipynb.zip

chiaravol commented 1 year ago

I solved the issue. The new code is attached. Anyway, it gives me another error when download_and_tranform the request: "52%|█████▏ | 190/365 [01:58<01:58, 1.48it/s]2023-07-12 16:05:35,237 INFO Welcome to the CDS 2023-07-12 16:05:35,238 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/satellite-ocean-colour 2023-07-12 16:05:35,251 INFO Request is queued 2023-07-12 16:05:36,260 INFO Request is failed 2023-07-12 16:05:36,261 ERROR Message: the request you have submitted is not valid 2023-07-12 16:05:36,262 ERROR Reason: There is no data matching your request. Check that you have specified the correct fields and values. 2023-07-12 16:05:36,262 ERROR Traceback (most recent call last): 2023-07-12 16:05:36,263 ERROR File "/opt/cds/cdsinf/python/lib/cdsinf/runner/dispatcher.py", line 163, in _consume...." Ocean_color_v6 (1).ipynb.zip

malmans2 commented 1 year ago

If you look at the CDS form, the dataset has some missing day (e.g., July 1998): https://cds.climate.copernicus.eu/cdsapp#!/dataset/satellite-ocean-colour?tab=form

It looks like the CDS allows you to request missing days if you use monthly chunks. I.e., use chunks={"year": 1, "month": 1}. I tried the following, and it worked OK:

from c3s_eqc_automatic_quality_control import diagnostics, download

collection_id = "satellite-ocean-colour"

request = {
    "variable": "mass_concentration_of_chlorophyll_a",
    "projection": "regular_latitude_longitude_grid",
    "version": "6_0",
    "format": "zip",
}
start = "1998-01"
stop = "1998-01"

requests = download.update_request_date(
    request, start=start, stop=stop, stringify_dates=True
)
dsmean = download.download_and_transform(
    collection_id,
    requests,
    transform_func=diagnostics.spatial_weighted_mean,
    chunks={"year": 1, "month": 1},
)
chiaravol commented 1 year ago

Hi Mattia,

Thanks for your suggestion, but when I increase the dataset length up to 1998-12 the kernel dies again.

Here the code I ran including the new chunks as you've suggested.

Ocean_color_v6 (3).ipynb.zip

malmans2 commented 1 year ago

It takes some time, but it works fine for me. Are you using the code I've shared this morning? If you just copy and paste the code below (1998-01 to 1998-12), it will quickly return the results as they are now cached:

from c3s_eqc_automatic_quality_control import diagnostics, download

collection_id = "satellite-ocean-colour"

request = {
    "variable": "mass_concentration_of_chlorophyll_a",
    "projection": "regular_latitude_longitude_grid",
    "version": "6_0",
    "format": "zip",
}
start = "1998-01"
stop = "1998-12"

requests = download.update_request_date(
    request, start=start, stop=stop, stringify_dates=True
)
dsmean = download.download_and_transform(
    collection_id,
    requests,
    transform_func=diagnostics.spatial_weighted_mean,
    chunks={"year": 1, "month": 1},
)
chiaravol commented 1 year ago

You're right...Thank you so much! I forgot to remove the ds line. Now, we are wondering if it's possible to include our function for the running mean in the library.

malmans2 commented 1 year ago

Is this what you are looking for?

dsmean.rolling(time=3, center=True).mean()

If yes, I think that it's easy enough that it's probably easier/clearer to use directly xarray rather than adding it to our library.

chiaravol commented 1 year ago

I tried it, but it's not working as we want because it provides mean values starting on day 3, while we'd like to calculate it from day 1.

This is the code we implemented that seems to work properly:

def xr_running_mean(vec, win): b = int(win/2) new = xr.DataArray(np.zeros(len(vec)), coords=vec.coords, dims=vec.dims, name='run_mean'+str(win)+str(vec.name)) for i in range(len(vec)): new[i] = vec[max(0,i-b):min(len(vec),i+b)].mean() return new

malmans2 commented 1 year ago

I can reproduce your algorithm using xarray:

chiara = xr_running_mean(dsmean["chlor_a"], 3)
mattia = dsmean["chlor_a"].rolling(time=2, min_periods=1).mean()

xr.testing.assert_equal(chiara, mattia)
chiaravol commented 1 year ago

I will check the algorithm later, but I have a new issue about the downloading: I changed the stop period to None in the code you sent me and I have this error again: 4%|▎ | 11/306 [00:00<00:20, 14.26it/s]2023-07-13 13:21:26,096 INFO Welcome to the CDS 2023-07-13 13:21:26,097 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/satellite-ocean-colour 2023-07-13 13:21:26,245 INFO Request is queued 2023-07-13 13:21:27,254 INFO Request is running 4%|▎ | 11/306 [00:19<00:20, 14.26it/s]2023-07-13 13:27:44,270 INFO Request is failed 2023-07-13 13:27:44,271 ERROR Message: the data source did not deliver data....

malmans2 commented 1 year ago

I'll take a look. But in general, it's better if you develop your notebook using a small subset of data (e.g., 1 year). We will then download the whole dataset and trigger the heavy computation when the notebook is ready and fully optimised.

malmans2 commented 1 year ago

I will check the algorithm later, but I have a new issue about the downloading: I changed the stop period to None in the code you sent me and I have this error again: 4%|▎ | 11/306 [00:00<00:20, 14.26it/s]2023-07-13 13:21:26,096 INFO Welcome to the CDS 2023-07-13 13:21:26,097 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/satellite-ocean-colour 2023-07-13 13:21:26,245 INFO Request is queued 2023-07-13 13:21:27,254 INFO Request is running 4%|▎ | 11/306 [00:19<00:20, 14.26it/s]2023-07-13 13:27:44,270 INFO Request is failed 2023-07-13 13:27:44,271 ERROR Message: the data source did not deliver data....

I think this was just a message from the CDS. Something failed during the download, so the system had to re-do the request. But I don't think your notebook failed. I tried what you did, and step 11/306 was already cached.

chiaravol commented 1 year ago

I see...I'll work on a shorter dataset then :) Plus, your function for the running mean works great! Thank you very much

malmans2 commented 1 year ago

Nevermind, I got the same error that you did in step 14. Don't worry about it for now, I think there's some connection issue or the CDS is having problems.

Anyways, when we are ready to run the full notebook, I will pre-populate the cache overnight and/or during the weekend using concurrent requests.

chiaravol commented 1 year ago

Hi Mattia,

I'd like to use your source_to_time_monthly_and_spatial_weighted_mean function, but it's not clear to me how to handle the source_to_time workaround.

Here the code I'm using: Ocean_color_v6-Copy1.ipynb.zip

Thanks

malmans2 commented 1 year ago

The dataset you are using does not need the source to time workaround, time is already a coordinate of the dataset. What's wrong with the code you've been using yesterday? The dataset returned by the code below should be already in good shape:

from c3s_eqc_automatic_quality_control import diagnostics, download

collection_id = "satellite-ocean-colour"

request = {
    "variable": "mass_concentration_of_chlorophyll_a",
    "projection": "regular_latitude_longitude_grid",
    "version": "6_0",
    "format": "zip",
}
start = "1998-01"
stop = "1998-12"

requests = download.update_request_date(
    request, start=start, stop=stop, stringify_dates=True
)
dsmean = download.download_and_transform(
    collection_id,
    requests,
    transform_func=diagnostics.spatial_weighted_mean,
    chunks={"year": 1, "month": 1},
)
chiaravol commented 1 year ago

The code from yesterday was working, but I'd like to do this:

def source_to_time_monthly(ds):

Naming convention: YYYYMM-*.nc

ds["source"] = pd.to_datetime(ds["source"].str.slice(None, 8), format="%Y%m")
return ds.rename(source="time")

def source_to_time_monthly_and_spatial_weighted_mean(ds): return diagnostics.spatial_weighted_mean(source_to_time_monthly(ds))

ds_mean = download.download_and_transform( collection_id, requests, transform_func=source_to_time_monthly_and_spatial_weighted_mean, split_all=True, )

It gives me error about the source.

malmans2 commented 1 year ago

Why would you like to do that? That's a workaround that is only needed for some satellite dataset that is missing the time coordinate. It infers the time coordinate from the source (i.e., the filename). Your dataset already has the time coordinate, and therefore does not have a source dimension.

Also, don't use split_all because we've been caching monthly data. Split_all would download and cache daily data, which might be overkilling in your case.

I would just use the code I shared with you yesterday.

chiaravol commented 1 year ago

Hi Mattia,

I've reorganized the code and it works fine now. I'd like to run it on the entire dataset (stop=None). How should I proceed to overcome the restarting kernel issue?

malmans2 commented 1 year ago

Please send me the latest version of the notebook. I will add a template for your use case and I will run it for the entire time period.

chiaravol commented 1 year ago

Here the new notebook Ocean_color_v6.ipynb.zip

malmans2 commented 1 year ago

In the latest notebook you are applying a simple lat/lon mean rather than the spatial weighted mean. As it will take some time to process all data, could you please confirm that you want to apply and cache unweighted reductions?

chiaravol commented 1 year ago

Yes, but we'll likely do other kind of processing later

malmans2 commented 1 year ago

All set. Here is the template: https://github.com/bopen/c3s-eqc-toolbox-template/blob/main/notebooks/wp5/ocean_color.ipynb

The entire time period is now cached (from 1997-09 to 2023-03), so if you start from the template any time period within this range is already computed.

I wasn't sure if you wanted to plot monthly means (resample(time="M")) or seasonality (groupby("time.month")), so I added both of them in the template.

Here is the notebook executed: https://gist.github.com/malmans2/661f722c6c02473bbad74b11a586f5fd

chiaravol commented 1 year ago

Hi Mattia,

Thank you. I'll be out of office and unable to run the template for a few days.

chiaravol commented 1 year ago

Hi Mattia,

How should I proceed now? Can I implement new functions in your template or should I keep working on my notebook on a shorter period?

malmans2 commented 1 year ago

I'd do a mix of the 2. Implement new functions starting from the template, but use a small time period to develop.

chiaravol commented 1 year ago

Ok. Thank you

chiaravol commented 1 year ago

Hi Mattia, I'd like to analyze ocean color data over different regions, but I don't have access to lat and long coordinates from the template you've shared as spatial averages are downloaded directly. I'm thus working on a new notebook, but I'm wondering if there is a way to modify the template to do that.

malmans2 commented 1 year ago

Hi,

Raw data is already cached, you can access it like this:

ds = download.download_and_transform(
    collection_id,
    requests,
    chunks={"year": 1, "month": 1},
)

If you need to cutuot a region, there's a utility function that is used in various notebooks. For example, to select the northern emisphere:

from c3s_eqc_automatic_quality_control import utils

lon_slice = slice(-180, 180)  # lon0 < lon1 as the raw data, so no sorting is needed
lat_slice = slice(90, 0)  # lat0 > lat 1 as the raw data, so no sorting is needed
ds_region = utils.regionalise(ds, lon_slice=lon_slice, lat_slice=lat_slice)

If you need to perform the same analysis as before on different regions, we can just add lon_slice/lat_slice arguments to the cached function.

chiaravol commented 1 year ago

Thank you. Now, I keep having the "restarting kernel" issue, even when downloading just 3 months of data....

malmans2 commented 1 year ago

I need to see the code you are running to help

chiaravol commented 1 year ago

There you are ocean_color.ipynb.zip

malmans2 commented 1 year ago

I'm not sure I follow. Why are you re-computing the same diagnostic we already cached on the whole domain?

chiaravol commented 1 year ago

I've modified the code you shared to download the complete set of data to weight the mean over the cosine of latitude. Not sure about what do you mean with "re-computing the same diagnostic"...

malmans2 commented 1 year ago

If you don't apply reductions through download_and_transform, you are re-computing everything from scratch every single time you run the code. Your dataset is quite big, so it needs quite a few resources and you need to be careful the way you compute (the VM is a shared machine, it's easy to run out of memory with these kind of datasets especially when it's busy). That's why I cached the diagnostic you needed when the VM wasn't busy.

If you tell me what you need, I will update the template to show you how to add features and avoid memory issues. Last week we computed global unweighted spatial mean. Do you also need weighted spatial mean for different regions?

chiaravol commented 1 year ago

I need the global weighted mean and the weighted mean over different regions.

The dataset is very big as it contains multiple variables. Would it be possible to retrieve only one variable (i.e. [chlor_a]), together with its coordinates and attributes, through download_and_transform to reduce the memory I need?

malmans2 commented 1 year ago

Yes that's what we do in the function that is in the template. We only reduce the dataarray chlor_a, not the full dataset. I'll update the template in a minute with the new features

malmans2 commented 1 year ago

I'm caching global data for one year. Could you please send me the coords of a region you need? That way you'll have an example already cached to begin with.

chiaravol commented 1 year ago

lon_slice = slice(-180, 180) . I need 6 slices for the latitude: 90,60 60,30 30,0 0,-30 -30,-60 -60,-90

malmans2 commented 1 year ago

Got it. Are you going to need unweighted means or weighted means only?

chiaravol commented 1 year ago

weighted means only

malmans2 commented 1 year ago

OK. I'll let you know when the new template is ready for testing

malmans2 commented 1 year ago

All set, the very first year is already cached, so if you execute the template as it is it should be quick. I've also added some quick'n'dirty plot.

Let me know if you need me to cache the entire time period.

Here is the template: https://github.com/bopen/c3s-eqc-toolbox-template/blob/main/notebooks/wp5/ocean_color.ipynb Here is the executed template: https://gist.github.com/malmans2/661f722c6c02473bbad74b11a586f5fd

chiaravol commented 1 year ago

I don't see where the cosine of latitude is accounted for. In every average calculation, chl values need to be weighted over the cose of latitude as: E.g. weights = np.cos(ds.latitudenp.pi/180) da_daily_weighted= (ds['chlor_a']weights).mean()

malmans2 commented 1 year ago

It's done by

diagnostics.spatial_weighted_mean

We have documented all functions in our software. See

help(diagnostics.spatial_weighted_mean)

In the previous version of the template we had the argument weights=False because you didn't want to weight the averages.

chiaravol commented 1 year ago

Thanks.

I'll let you know when I'm ready to execute for the entire dataset