Closed virginiaciardini closed 11 months ago
Hi @virginiaciardini,
I don't think I'll be able to work on this today, so I'll probably look at this next week. I'll send you a snippet or a template to show you how to use our software with your dataset!
Hi @malmans2, ok, thanks!
Hi @virginiaciardini,
The template is ready. You can find it here: https://github.com/bopen/c3s-eqc-toolbox-template/blob/main/notebooks/wp5/tropopause.ipynb
You can just change start/stop in the cell at the top, and it should work with the time period you'd like to analyse.
This is what's happening under the hood:
area
or variables
), you will only download the data once (I already downloaded and cached most of the data on the VM).compute_tropopause_altitude
A few comments:
xarray
, so I used ds
to make the plot. If you prefer to use pandas, you can just do this: df = ds.to_pandas()
c3s_eqc_automatic_quality_control.diagnostics
without having to define it in the notebook.area
parameter).I've already cached the global tropopause altitude from 2006-07 to 2017-12. So if you run the template changing start/stop only, it should be very quick.
Here is the results I get using the time period in your notebook:
start = "2016-02"
stop = "2016-02"
Using the same area as well, I can reproduce your figure:
start = "2016-02"
stop = "2016-02"
request = {
"area": [55, 10, 50, 15],
"format": "csv-lev.zip",
"variable": ["air_temperature", "altitude"],
}
Hi @malmans2, Thanks, I'll test it and I' ll let you know if I everything is claer to me Best, Virginia
HI @malmans2, I tested the template; firstly, as you suggested, applying transform functions to the whole dataset (Ichanged start and stop) but then I didn't find how to exclude useless stations; trying something I used the "area" as well (as you shown above), but I received this error message:
_ValueError Traceback (most recent call last)
Cell In[10], line 1
----> 1 ds = download.download_and_transform(
2 collection_id,
3 requests,
4 chunks={"year": 1, "month": 1},
5 transform_func=compute_tropopause_altitude,
6 )
File /data/common/mambaforge/envs/wp5/lib/python3.10/site-packages/c3s_eqc_automatic_quality_control/download.py:545, in download_and_transform(collection_id, requests, chunks, split_all, transform_func, transform_func_kwargs, transform_chunks, n_jobs, invalidate_cache, cached_open_mfdataset_kwargs, **open_mfdataset_kwargs)
540 cacholote.delete(
541 func.func, *func.args, request_list=[request], **func.keywords
542 )
543 with cacholote.config.set(return_cache_entry=True):
544 sources.append(
--> 545 func(request_list=[request]).result["args"][0]["href"]
546 )
547 ds = xr.open_mfdataset(sources, **cached_open_mfdataset_kwargs)
548 else:
549 # Cache final dataset transformed
File /data/common/mambaforge/envs/wp5/lib/python3.10/site-packages/cacholote/cache.py:86, in cacheable.<locals>.wrapper(*args, **kwargs)
83 warnings.warn(str(ex), UserWarning)
84 clean._delete_cache_entry(session, cache_entry)
---> 86 result = func(*args, **kwargs)
87 cache_entry = database.CacheEntry(
88 key=hexdigest,
89 expiration=settings.expiration,
90 tag=settings.tag,
91 )
92 try:
File /data/common/mambaforge/envs/wp5/lib/python3.10/site-packages/c3s_eqc_automatic_quality_control/download.py:434, in _download_and_transform_requests(collection_id, request_list, transform_func, transform_func_kwargs, **open_mfdataset_kwargs)
431 ds = xr.open_mfdataset(sources, **open_mfdataset_kwargs)
433 if transform_func is not None:
--> 434 ds = transform_func(ds, **transform_func_kwargs)
435 if not isinstance(ds, xr.Dataset):
436 raise TypeError(
437 f"`transform_func` must return a xr.Dataset, while it returned {type(ds)}"
438 )
Cell In[5], line 53, in compute_tropopause_altitude(ds)
51 def compute_tropopause_altitude(ds):
52 dataarrays = []
---> 53 for report_id, ds_id in ds.groupby(ds["report_id"]):
54 coords = {"report_id": ("time", [report_id])}
55 for var, da_coord in ds_id.data_vars.items():
File /data/common/mambaforge/envs/wp5/lib/python3.10/site-packages/xarray/core/dataset.py:9031, in Dataset.groupby(self, group, squeeze, restore_coord_dims)
9023 from xarray.core.groupby import (
9024 DatasetGroupBy,
9025 ResolvedUniqueGrouper,
9026 UniqueGrouper,
9027 _validate_groupby_squeeze,
9028 )
9030 _validate_groupby_squeeze(squeeze)
-> 9031 rgrouper = ResolvedUniqueGrouper(UniqueGrouper(), group, self)
9033 return DatasetGroupBy(
9034 self,
9035 (rgrouper,),
9036 squeeze=squeeze,
9037 restore_coord_dims=restore_coord_dims,
9038 )
File <string>:6, in __init__(self, grouper, group, obj)
File /data/common/mambaforge/envs/wp5/lib/python3.10/site-packages/xarray/core/groupby.py:335, in ResolvedGrouper.__post_init__(self)
334 def __post_init__(self) -> None:
--> 335 self.group: T_Group = _resolve_group(self.obj, self.group)
337 (
338 self.group1d,
339 self.stacked_obj,
340 self.stacked_dim,
341 self.inserted_dims,
342 ) = _ensure_1d(group=self.group, obj=self.obj)
File /data/common/mambaforge/envs/wp5/lib/python3.10/site-packages/xarray/core/groupby.py:641, in _resolve_group(obj, group)
638 newgroup = group
640 if newgroup.size == 0:
--> 641 raise ValueError(f"{newgroup.name} must not be empty")
643 return newgroup
ValueError: report_id must not be empty_
could you give me same suggestions? thanks, Virginia
Hi @virginiaciardini,
I added a stations
parameter in the latest template, to show you how I would do that: https://github.com/bopen/c3s-eqc-toolbox-template/blob/main/notebooks/wp5/tropopause.ipynb
Looks like the area approach is not working very well because there are months with no data at all. We could easily fix it, but I think the other approach is better (i.e., compute and cache tropopause for the whole dataset, then filter it).
Hi @malmans2 , I tested the "filter stations", thanks al lot
You're welcome! Do you need to add more analyses to this notebook or can we close this issue?
Hi, now, i need to add monthly and seasonal means of the tropopause altitude; I tried to write the routine and then I’ll ask your support to verify it or your help if I have problems. Thanks, Virginia
Virginia Ciardini ENEA Laboratorio di Osservazioni E Misure per l'ambiente e il clima (SSPT-PROTER-OEM) Laboratory of Observations And Measures for the environment and climate Via Anguillarese, 301 00123 Roma Italy Tel: +39 06 3048 6127 VoIP: +39 06 3048 7435 Fax: +39 06 3048 6678
Da: Mattia Almansi @.> Inviato: lunedì 12 giugno 2023 10:35 A: bopen/c3s-eqc-toolbox-template @.> Cc: Virginia Ciardini @.>; Mention @.> Oggetto: Re: [bopen/c3s-eqc-toolbox-template] GRUAN UQ #1 - download data issue (Issue #59)
You're welcome! Do you need to add more analyses to this notebook or can we close this issue?
— Reply to this email directly, view it on GitHubhttps://github.com/bopen/c3s-eqc-toolbox-template/issues/59#issuecomment-1586850761, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A66BG4WL6SNDAO47NXVPZ33XK3IEZANCNFSM6AAAAAAYWUIERQ. You are receiving this because you were mentioned.Message ID: @.**@.>>
Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle persone indicate e la casella di posta elettron ica da cui è stata inviata è da qualificarsi quale strumento aziendale.
La diffusione, copia o qualsiasi altra azione derivante dalla conoscenza di queste informazioni sono rigorosamente viet ate (art. 616 c.p, D.Lgs. n. 196/2003 s.m.i. e GDPR Regolamento - UE 2016/679).
Qualora abbiate ricevuto questo documento per errore siete cortesemente pregati di darne immediata comunicazione al mit tente e di provvedere alla sua distruzione. Grazie.
This e-mail and any attachments is confidential and may contain privileged information intended for the addressee(s) on ly.
Dissemination, copying, printing or use by anybody else is unauthorised (art. 616 c.p, D.Lgs. n. 196/2003 and subsequen t amendments and GDPR UE 2016/679).
If you are not the intended recipient, please delete this message and any attachments and advise the sender by return e -mail. Thanks.
OK, the VM is experiencing problem at the moment. Please don't use it until further notice (should be quick).
I see. I wait your ok. thanks
The VM is back in business. It was rebooted, so you'll have to re-do the procedure and run jupyter_server
.
You'll notice that the interface changed, we're now using jupyter lab instead of jupyter notebook, as the latter will eventually be deprecated.
HI @malmans2 I modified the JN and again I need your support; after several attempts there’s something that I can not solve.
Hi @malmans2, I'm trying some function calling the module from statsmodels. but I receive the following message: ModuleNotFoundError: No module named 'statsmodels' Could you help me? Thanks,
Hi @virginiaciardini,
I've been out of the office a couple of weeks, so I have a few issues in the backlog and I haven't looked at your updated notebook yet.
statsmodels is not part of python standard library, so it needs to be installed. Do you want me to install it on the VM?
Hi @malmans2, thanks. If it is possible, yes i do.
OK, there's a few people that are using the VM right now. I'll do it overnight to make sure we don't break their environments.
You'll find it installed tomorrow morning, make sure you restart the kernel before importing it.
thanks
Hi @virginiaciardini, sorry again for the delay.
2nd figure: I tried to format the xaxis ticks as plot above (2006, 2007, …2020)
for station, da in ds["tropopause"].groupby("station_name"): da_resampled = da.resample(time="M") da_mean = da_resampled.mean().to_pandas() da_std = da_resampled.std().to_pandas() da_mean.plot(yerr=da_std, marker=".", label=station)
I’d like to add another figure, similarly to the 2nd figure, but resampling on "time.season"; I tried but it does not work;
Assuming you want DJF, MAM, JJA, SON, I think you can substitute da_resampled
with the following (I never used it though, so please make sure that it's correct):
da_resampled = da.resample(time="QS-DEC")
I’d like to save LRT into a file txt and after download/copy it on my local machine
ds.to_pandas().to_csv("my_file.csv")
Hi @malmans2, thanks, I'm following your instructions and working on the VM but I have some problems with the server connection. I was running my JN and this error message appeared. Server Connection Error A connection to the Jupyter server could not be established. JupyterLab will continue trying to reconnect. Check your network connection or Jupyter server configuration. My connection is ok; do you know if there is any limitations today? thanks, Virginia
Looks OK now, but maybe there was an hiccup before.
Can you try to close and re-open the ssh tunnel?
For example, from your local machine do this to close all ssh tunnels: pkill ssh
Then, re-do the usual procedure to work with jupyter (log into the VM, log into your user, go to you directory, run jupyter_server
, follow the instructions)
Thanks, now it works
Hi @malmans2 , i'm trying to open my JN. I connect to the VM but I cannot open the JN in the browser. Do you know what can be the problem? thanks, Virginia
You just have to re-do this: https://github.com/bopen/c3s-eqc-toolbox-template/issues/59#issuecomment-1643552234
We had to close the tunnels overnight. Keep in mind that the port assigned to you might have changed, so make sure you copy & paste the new commands printed by (jupyter_server
)
Hi @malmans2 , I updated my JN (enclosed); I would like to reveice your feedback to optimize the code and to fix any issues. In figure 3 I tried to use time as x_array but I didn't succeed and I used number of records (i.e. n. of months). Could you please help me to fix it. Thanks, Virginia C3S_520_Quality_assessment_Template_gruan_uq1_v2.zip
Hi @virginiaciardini,
I've updated the template: https://github.com/bopen/c3s-eqc-toolbox-template/blob/main/notebooks/wp5/tropopause.ipynb Here is the template executed: https://gist.github.com/malmans2/8fc0093b534d38b8820c5497d6892c57
If I'm understanding correctly how seasonal_decompose
works, I think you are supposed to feed regularly sampled data. Therefore, you need to interpolate missing months rather than dropping them. This is why the results are slightly different compared to your version.
Please let me know if everything works OK.
Hi @malmans2, thanks a lot. I've been out of office so I'm looking at your update now; I'll let you know if everything works ok. Thanks!
Hi @virginiaciardini,
Was this template OK? Can we close this issue?
Yes, you can. Thanks Virginia
Notebook description
For my analysis I need to download all the available data for 3 GRUAN stations (LIN, NYA and TEN), that I identify in my request through the "area" field, to calulate the lapse rate tropopause at different latitude. Code and text are still under definition, but I need support.
Notebook link or upload
C3S_520_Quality_assessment_Template_gruan_uq1.zip
Anything else we need to know?
Firstly, I did some tests locally on my machine, now I would like to run it on the VM. First issue I faced: downloading data for a single year, my routine works well, otherwise, for the entire data range (e.g. 2006-2015) it doesn't. Secondly, I don't understand exactly how to use the "download.download_and_transform" for my purpose. Could you please have a look to the code and give me some advises?
I'm new with python and I’m working on my first Jupyter Notebook. Before continuing with the analysis, I would like to receive your feedback to optimize the code and to fix any issues. Thanks a lot, Virginia
Environment