Project-OSmOSE / datarmor-toolkit

This repo gathers all our analytical codes to process big (mostly audio) data (eg related to machine learning, ambient noise analysis…)
0 stars 6 forks source link

local_notebooks #24

Closed gmanatole closed 1 year ago

cazaudo commented 1 year ago

thanks for this marvellous piece of code , i tried to configure a set to download as follows

CHOOSE WHICH SINGLE LEVELS TO DOWNLOAD IN data

data = ['10m_u_component_of_wind', '10m_v_component_of_wind'] #'10m_u_component_of_wind', '10m_v_component_of_wind', 'total_precipitation'

CHOOSE THE YEAR/MONTH/DAY AND HOURS THAT YOU WANT TO DOWNLOAD DATA FROM

years = ['2014'] #'2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017', '2018','2019',

'2020','2021','2022']

months = ['04'] # '01','02','03','04','05','06','07','08','09','10','11','12']

days = ['01', '02', '03','04','05','06','07','08','09','10','11','12','13','14','15','16','17', '18','19','20','21','22','23','24','25','26','27','28','29','30','31']

hours = ['00:00','01:00','02:00','03:00','04:00', '05:00','06:00','07:00','08:00', '09:00','10:00','11:00','12:00','13:00','14:00','15:00','16:00','17:00','18:00','19:00', '20:00','21:00','22:00','23:00']

and I got :

You have selected :

10m_u_component_of_wind 10m_v_component_of_wind

for the following times year months days hours 0 2014 04 01 00:00 1 NaN NaN 02 01:00 2 NaN NaN 03 02:00 3 NaN NaN 04 03:00 4 NaN NaN 05 04:00 5 NaN NaN 06 05:00 6 NaN NaN 07 06:00 7 NaN NaN 08 07:00 8 NaN NaN 09 08:00 9 NaN NaN 10 09:00 10 NaN NaN 11 10:00 11 NaN NaN 12 11:00 12 NaN NaN 13 12:00 13 NaN NaN 14 13:00 14 NaN NaN 15 14:00 15 NaN NaN 16 15:00 16 NaN NaN 17 16:00 17 NaN NaN 18 17:00 18 NaN NaN 19 18:00 19 NaN NaN 20 19:00 20 NaN NaN 21 20:00 21 NaN NaN 22 21:00 22 NaN NaN 23 22:00 23 NaN NaN 24 23:00 24 NaN NaN 25 NaN 25 NaN NaN 26 NaN 26 NaN NaN 27 NaN 27 NaN NaN 28 NaN 28 NaN NaN 29 NaN 29 NaN NaN 30 NaN 30 NaN NaN 31 NaN

Your boundaries are : North -65, South -75, East 60, West 65

Questions:

gmanatole commented 1 year ago

Thanks for testing out the code. To answer your questions :

cazaudo commented 1 year ago

i got weird results with the os.path.expanduser method , setting my home directory to a wrong place .. should be better to avoid it ? as very important because this folder is where you save your file results

besides it would be nice to add in the ReadMe.txt a code snippet to directly visualize some of the downloaded data

gmanatole commented 1 year ago

I've added the option to choose the directory in which you want to save era data. Leaving None defaults to creating an api directory in your home folder. Tried adding an example of output arrays from cds (stamps + one chosen single level) but can't seem to format it properly using markdown syntax.

gmanatole commented 1 year ago

Changed stamps format from 4D array to four different 1D arrays of different dimensions for time (two formats : datetime and epoch), latitude and longitude. Removed old stamps format.

cazaudo commented 1 year ago

thanks but actually would be much more convenient to store all arrays in a single .npy instead of having all these individual files , at least you should put lat , lon and timestamps in a single .npy

cazaudo commented 1 year ago

here is a code snippet plotting a variable , to be added in the read_me.txt

creating a plot

from mpl_toolkits.mplot3d import Axes3D import matplotlib.pyplot as plt import matplotlib.cm as cm import numpy as np import matplotlib.pyplot as plt

tp = np.load('/content/drive/MyDrive/enseignement /UE_ML_2021_2022/TPac/api/tp_ERA5.npy', allow_pickle=True) timestamp = np.load('/content/drive/MyDrive/enseignement /UE_ML_2021_2022/TPac/api/timestamps.npy', allow_pickle=True) vect_lat = np.load('/content/drive/MyDrive/enseignement /UE_ML_2021_2022/TPac/api/latitude.npy', allow_pickle=True) vect_lon = np.load('/content/drive/MyDrive/enseignement /UE_ML_2021_2022/TPac/api/longitude.npy', allow_pickle=True)

mat_lat, mat_lon = np.meshgrid(vect_lat,vect_lon)

fig = plt.figure(figsize=(6,5)) plt.pcolormesh(mat_lon,mat_lat, tp[0,:,:].T, cmap=cm.jet) plt.title('tp variable at time '+str(timestamp[0])) plt.colorbar() plt.show()

gmanatole commented 1 year ago

What format ? Because lat/lon/time don't necessarily have the same length

cazaudo commented 1 year ago

use np.savez , besides it will compress your file

gmanatole commented 1 year ago

Implemented stamps array with 4 arrays stored : timestamps as datetime/timestamps as epoch/latitude/longitude Updated README.md to explain how to access stamps data

cazaudo commented 1 year ago

please change the path in api_func to os.path.join(path, "api") instead of using '+'

change visualization code of read_me to :

from mpl_toolkits.mplot3d import Axes3D import matplotlib.pyplot as plt import matplotlib.cm as cm import numpy as np import matplotlib.pyplot as plt import os import sys

variable_name= 'u10'

if not os.path.exists(os.path.join(path,'api',variablename+''+filename+'.npy')): print('no ERA data with this variable') sys.exit()

var1 = np.load(os.path.join(path,'api',variablename+''+filename+'.npy'), allow_pickle=True) stamps = np.load(os.path.join(path,'api','stamps.npz'), allow_pickle=True)

mat_lat, mat_lon = np.meshgrid(stamps['latitude'],stamps['longitude'])

fig = plt.figure(figsize=(6,5)) if var1.shape[1:]==(1,1): plt.plot(stamps['timestamps'],var1[:,0,0]) else: plt.pcolormesh(mat_lon,mat_lat, var1[0,:,:].T, cmap=cm.jet) plt.title('tp variable at time '+str(stamps['timestamps'][0])) plt.colorbar() plt.show()

gmanatole commented 1 year ago

Changed make_cds function to use os.path.join instead of string concatenation Modified notebook guidelines for path selection Modified plot code in README

gmanatole commented 1 year ago

Added cazaudo's version of the notebook (mount drive + pip install packages) as download_era.ipynb Saved old notebook as cds_api_backup.ipynb

gmanatole commented 1 year ago

Removed drive mount+directory change from notebook

gmanatole commented 1 year ago

Ready to merge :)