AntSimi / py-eddy-tracker

Eddy identification and tracking
https://py-eddy-tracker.readthedocs.io/en/latest/
GNU General Public License v3.0
129 stars 55 forks source link

Eddytracking phase #125

Closed Sheveenah closed 2 years ago

Sheveenah commented 2 years ago

Hello AntSimi, I have been struggling for 2 weeks with this, and I've tried tweaking and playing around. I'll share the data with you, and I would be so thankful if you could please help me. https://drive.google.com/drive/folders/1hmb_yfzd9V-6Zc0No3RrdXlzxoCzEupO?usp=sharing

On this link you will find two folders, one is file1whichdoesnotwork, the other one is file2whichworks. The two files are netcdf files of 30 days timesteps, file 2 with a bigger regional grid (60 vertical levels, hence I put s_rho=59) file1 is 30 days timesteps too, but cut from file2, it is a much smaller grid, with only 1 vertical level, hence I put s_rho=0.

I have first attempted to detect the eddies, I ran it, it works for both of them. I have ran only 20 timesteps and it generated the daily cyclonic and anticyclonic files.

This is what I had in my notebook

from datetime import datetime
from py_eddy_tracker import start_logger
from py_eddy_tracker.dataset.grid import UnRegularGridDataset
from numpy import zeros
from netCDF4 import Dataset
from datetime import datetime, timedelta
# for plotting
from matplotlib import pyplot as plt

class RomsDataset(UnRegularGridDataset):
    __slots__ = list()

    def init_speed_coef(self, uname, vname):
        # xi_u and eta_v must be specified because this dimension are not use in lon/lat
        u = self.grid(uname, indexs=dict(xi_u=slice(None)))
        v = self.grid(vname, indexs=dict(eta_v=slice(None)))
        u = self.rho_2d(u.T).T
        v = self.rho_2d(v)
        self._speed_norm = (v ** 2 + u ** 2) ** .5

    @staticmethod
    def rho_2d(x):
        """Transformation to have u or v on same grid than h
        """
        M, Lp = x.shape
        new_x = zeros((M + 1, Lp))
        new_x[1:-1] = .5 * (x[:-1] + x[1:])
        new_x[0] = new_x[1]
        new_x[-1] = new_x[-2]
        return new_x

    @staticmethod
    def psi2rho(var_psi):
        """Transformation to have vrt on same grid than h
        """
        M, L = var_psi.shape
        Mp=M+1; Lp=L+1
        Mm=M-1; Lm=L-1
        var_rho = zeros((Mp, Lp))
        var_rho[1:M,1:L]=0.25*(var_psi[0:Mm,0:Lm]+var_psi[0:Mm,1:L]+var_psi[1:M,0:Lm]+var_psi[1:M,1:L])
        var_rho[0,:]=var_rho[1,:]
        var_rho[Mp-1,:]=var_rho[M-1,:]
        var_rho[:,0]=var_rho[:,1]
        var_rho[:,Lp-1]=var_rho[:,L-1]
        return var_rho

if __name__ == '__main__':
    start_logger().setLevel('DEBUG')
    grid_name = '/media/staukoor/Expansion/pa.nc.2'
    #grid_name = '/media/staukoor/Backup Plus/March2010/romsfiles/roms_avg_Y2010M3.nc.2'
    #grid_name = '/home/staukoor/Desktop/apri99pa.nc.2'
    lon_name, lat_name = 'lon_rho', 'lat_rho'
    # To AntSimi, if this is file1,please use s_rho=0
    # If this is file2,please use s_rho=59
    s_rho = 59
    #s_rho=0
    for time in range(0,20,1):
        dtfile = 1
        tfile = 1*1//dtfile
        ###########
        realyear_origin = datetime(2010,1,1)
        date = realyear_origin + timedelta(days=float(time)*dtfile/1.)

        infiletime = time%tfile
        filetime = time - time%tfile
        h = RomsDataset(grid_name, lon_name, lat_name, indexs=dict(time=time, s_rho=s_rho))
        a, c = h.eddy_identification('zeta', 'u', 'v', date, 0.02, pixel_limit=(10, 2000), shape_error=55, force_height_unit='m',force_speed_unit='m/s')

        with Dataset(date.strftime('/media/staukoor/Expansion/eddies_trial2/Anticyclonic_%Y%m%d.nc'), 'w') as h:
            a.to_netcdf(h)
        with Dataset(date.strftime('/media/staukoor/Expansion/eddies_trial2/Cyclonic_%Y%m%d.nc'), 'w') as h:
            c.to_netcdf(h)

        #plot
        fig = plt.figure(figsize=(7,7))
        ax = fig.add_axes([.03,.03,.94,.94])
        ax.set_title('Eddies detected -- Cyclonic(red) and Anticyclonic(blue)')

        # ax.set_ylim(-40,75)
        ax.set_aspect('equal')
        a.display(ax, color='b', linewidth=.5)
        c.display(ax, color='r', linewidth=.5)
        ax.grid()
        #fig.savefig('/media/staukoor/Expansion/eddies_trial/eddies1_' + str(filetime) + '.png')

Now I go to track them. I run my anticyclonic and cyclonic yaml file. And it works for the file 2 (the bigger grid). It generates the cyclonic.nc, correspondances, tracktooshort, etc...

It doesn't do it for the smaller grid, file1. Now I've changed the test step from 0.02 to 0.002 or 0.0002, played with pixel size, or anything else... I've increased the timestep to 800 timesteps, (a larger dataset with same config as file1) It gives me this same error everytime I ran the "EddyTracking area_trackerCy1.yaml -v DEBUG"

This is my yaml file

PATHS:
  # Files produces with EddyIdentification
  FILES_PATTERN: /media/staukoor/Expansion/eddies_trial2/Cyclonic*.nc
  SAVE_DIR: /media/staukoor/Expansion/eddies_trial2

# Number of timestep for missing detection
VIRTUAL_LENGTH_MAX: 3
# Minimal time to consider as a full track
TRACK_DURATION_MIN: 3

CLASS:
    # Give the module to import,
    # must be available when you do "import module" in python
    MODULE: py_eddy_tracker.featured_tracking.area_tracker
    # Give class name which must be inherit from
    # py_eddy_tracker.observations.observation.EddiesObservations
    CLASS: AreaTracker

This is the error which comes from the terminal.

Traceback (most recent call last):
  File "/home/staukoor/anaconda3/bin/EddyTracking", line 33, in <module>
    sys.exit(load_entry_point('pyEddyTracker==0+unknown', 'console_scripts', 'EddyTracking')())
  File "/home/staukoor/anaconda3/lib/python3.8/site-packages/py_eddy_tracker/appli/eddies.py", line 206, in eddies_tracking
    track(
  File "/home/staukoor/anaconda3/lib/python3.8/site-packages/py_eddy_tracker/appli/eddies.py", line 324, in track
    c.track()
  File "/home/staukoor/anaconda3/lib/python3.8/site-packages/py_eddy_tracker/tracking.py", line 385, in track
    i_previous, i_current, association_cost = self.previous_obs.tracking(
  File "/home/staukoor/anaconda3/lib/python3.8/site-packages/py_eddy_tracker/featured_tracking/area_tracker.py", line 46, in tracking
    i_self, i_other = self.solve_function(cost_mat)
  File "/home/staukoor/anaconda3/lib/python3.8/site-packages/py_eddy_tracker/observations/observation.py", line 1433, in solve_function
    return numba_where(self.solve_simultaneous(cost_matrix))
  File "/home/staukoor/anaconda3/lib/python3.8/site-packages/py_eddy_tracker/observations/observation.py", line 1333, in solve_simultaneous
    max_links = max(self_links.max(), other_links.max())
  File "/home/staukoor/anaconda3/lib/python3.8/site-packages/numpy/core/_methods.py", line 39, in _amax
    return umr_maximum(a, axis, None, out, keepdims, initial, where)
ValueError: zero-size array to reduction operation maximum which has no identity

Yet no error comes from the file2, it tracks the eddies... I'm lost as to what I must do to make it work in the smaller grid.

Would you know what's going on, and what am I doing wrong?

Thanks in advance. And best wishes for the new year! Sheveenah

AntSimi commented 2 years ago

Hi,

Some ideas to improve detection:

Antoine

Sheveenah commented 2 years ago

Hello Antoine, thanks so much for your reply and your time.

Although I think I did not explain properly. So my main goal is to produce a time series of the eddy centre longitude (the longitude variable from the cyclonic.nc file) in a smaller regional box.

1st option

I did apply the filter as you suggested. I don't think I am worried about detection as it does detect in both cases. When I use your script to display it without filter, it shows. And as you suggested I did change the step from 0.02m to even 0.00002, what I'm failing is the running of the yaml files. Is there something I should add in this code which could help generate the cyclonic, correspondances, tracktooshort, etc, for the apri99.nc.2 file?

        h = RomsDataset(grid_name, lon_name, lat_name, indexs=dict(time=time, s_rho=s_rho))
        a, c = h.eddy_identification('zeta', 'u', 'v', date, 0.00002, pixel_limit=(10, 2000), shape_error=55, force_height_unit='m',force_speed_unit='m/s')

        with Dataset(date.strftime('/media/staukoor/Expansion/eddies_trial2/Anticyclonic_%Y%m%d.nc'), 'w') as h:
            a.to_netcdf(h)
        with Dataset(date.strftime('/media/staukoor/Expansion/eddies_trial2/Cyclonic_%Y%m%d.nc'), 'w') as h:
            c.to_netcdf(h)

2nd option

Or, let's say it won't work because of the big current in the region, I have also thought of relying on the bigger grid (file2whichworks), which does allow me to run the yaml files and generates the cyclonic, correspondances, tracktooshort, etc but then in this case, I'm failing to slice the dataset into a smaller grid....

I have tried to slice my longitude, latitude on nco, cdo, but I couldn't do it.

As for doing it in python, firstly I attempted to just delete unwanted values in the longitude array, but then this was problematic as the shape for time remains the same, and I cannot now plot longitude v.s time.

I set the unwanted values of longitude to nan. And I'm still failing. Maybe I'm doing it wrong.

But I'll share the Cyclonic.nc file (which was generated from the yaml file of the larger grid), (it's stored in the folder file3forAntoine) and my notebook. https://drive.google.com/drive/folders/1hmb_yfzd9V-6Zc0No3RrdXlzxoCzEupO?usp=sharing

I would be grateful if you could please help me understand why I can't just zoom in on a small box and produce my time series of the longitude eddy centre...

import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
import time
import datetime
import sys
sys.path.append('/home/staukoor/Desktop/clementupdatedcrocotools')
from interp_Cgrid import *
import netCDF4 as nc
import cftime as cf
import statsmodels.api as sm
from statsmodels.tsa.seasonal import seasonal_decompose
import scipy.stats
from netCDF4 import Dataset
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as st
#import py_eddy_tracker_sample
from matplotlib import pyplot as plt
import xarray as xr

DIR='/home/staukoor/Desktop' ## path to the data
file='/Cyclonic' 
suffix='.nc' ## 

tindex=0 ## always zero for the monthly mean files as there is only 1 timestep in each file

inifile=DIR+file+suffix
print(inifile)

DS=xr.open_dataset(inifile,decode_times=False)
## extract the required variables
lonv=DS['longitude'].values
latv=DS['latitude'].values
time = DS['time']

##Say I now set some unwanted values as nan
lonv[lonv<20]=np.nan

bins = 10
bin_means_ma, bin_edges_ma, binnumber_ma = st.binned_statistic(time, lonv, statistic='mean', bins=bins)
b_ma = (bin_edges_ma[:-1]+bin_edges_ma[1:])*0.5
plt.plot(b_ma, bin_means_ma, '-b', label=r'$A_{SCV_{s}}$ radii', alpha=1.)
#plt.plot(b_mc, bin_means_mc, '-r', label=r'$C_{SCV_{s}}$ radii', alpha=1.)
plt.grid(True)

I feel like I'm really close to reaching my goal in the option 2, but I must just know how to slice my longitude and latitude array in order to work in a smaller regional grid.

Basically any option where you would be able to cut from that big grid to a smaller one in that cyclonic.nc file, this would be my solution.

thank you in advance Sheveenah Taukoor

AntSimi commented 2 years ago

Maybe my answer will be too general.

I think you need in a first time to get a correct identification before to run a tracking(There are several methods, which could be used, with its pros and cons) or analyse on raw detection. So for identification my advice is to used step of 2 mm step=0.002. You must also take care of pixel limit, if you used a very high resolution product 2000 pixel could be a tiny area.

Identification on area subset

You could run detection on a reduced area i f you want save compute time, if it's not a problem of time computation run detection on whole grid and you will do an area subset on tracking eddies. With indexs keyword you could load only a part of the grid

h = RomsDataset(
     grid_name, lon_name, lat_name, 
     indexs=dict(time=time, s_rho=s_rho, my_latitude_dimension=slice(start_index, end_index))
)

Don't replace coordinates by nan values.

Check identification result

Before to run tracking check identification results, by example number of eddies detected by day. Display several consecutive days to see if you see eddies movement.

Tracking

You choose AreaTracker, which is a good things, so i suspect there are some days, where no eddies are detected. Tracking is an important step which allow to remove false detection.

My understanding

I will try to reformulate your needs, because i am not a nativ english speaker, and maybe i don't understand correctly you problems and needs. You want to know when some area include eddy center. You could also compute when some area are include in an eddy contour. This needs is similar at this example, here we count how many time a pixel is enclosed in an eddy. Frequency => pixel enclosed in an eddy / nb days of period. We could imagine a similar diagnostic along time or time/pixel, i will try to build an example about it.

If you need to select eddies in a box, you could use this method

Don't hesitate to ask again if you don't have your answer.

Sheveenah commented 2 years ago

Bonjour Antoine, tu es a Paris? I am a phd student from South Africa, currently in LOPS, Brest. Thanks for the advice on a Saturday morning! The slice for box, I have just tried this using the file2whichworks, and yet I am sorry for bothering you again.

https://drive.google.com/drive/folders/1hmb_yfzd9V-6Zc0No3RrdXlzxoCzEupO?usp=sharing

        x1, x2 = 350, 480
        y1, y2 = 200, 375
        h = RomsDataset(grid_name, lon_name, lat_name, indexs=dict(time=time,eta_rho=slice(y1, y2),xi_rho=slice(x1, x2),eta_v=slice(y1, y2-1),xi_u=slice(x1, x2-1),lon_rho=slice(y1, y2),lat_rho=slice(x1, x2),lon_v=slice(y1, y2-1),lat_u=slice(y1, y2),lon_u=slice(x1, x2-1),lat_v=slice(x1, x2),s_rho=s_rho))        

But I'm getting this error

     25         v = self.rho_2d(v)
     26 
---> 27         self._speed_norm = (v ** 2 + u ** 2) ** .5
     28 
     29 

ValueError: operands could not be broadcast together with shapes (578,130) (175,674) 

That's the whole script I've used... Would you know what I'm doing wrong?

for plotting

from matplotlib import pyplot as plt

class RomsDataset(UnRegularGridDataset):
    __slots__ = list()

    def init_speed_coef(self, uname, vname):
        # xi_u and eta_v must be specified because this dimension are not use in lon/lat
        u = self.grid(uname, indexs=dict(xi_u=slice(None)))
        v = self.grid(vname, indexs=dict(eta_v=slice(None)))
        u = self.rho_2d(u.T).T
        v = self.rho_2d(v)
        self._speed_norm = (v ** 2 + u ** 2) ** .5

    @staticmethod
    def rho_2d(x):
        """Transformation to have u or v on same grid than h
        """
        M, Lp = x.shape
        new_x = zeros((M + 1, Lp))
        new_x[1:-1] = .5 * (x[:-1] + x[1:])
        new_x[0] = new_x[1]
        new_x[-1] = new_x[-2]
        return new_x

    @staticmethod
    def psi2rho(var_psi):
        """Transformation to have vrt on same grid than h
        """
        M, L = var_psi.shape
        Mp=M+1; Lp=L+1
        Mm=M-1; Lm=L-1
        var_rho = zeros((Mp, Lp))
        var_rho[1:M,1:L]=0.25*(var_psi[0:Mm,0:Lm]+var_psi[0:Mm,1:L]+var_psi[1:M,0:Lm]+var_psi[1:M,1:L])
        var_rho[0,:]=var_rho[1,:]
        var_rho[Mp-1,:]=var_rho[M-1,:]
        var_rho[:,0]=var_rho[:,1]
        var_rho[:,Lp-1]=var_rho[:,L-1]
        return var_rho

if __name__ == '__main__':
    start_logger().setLevel('DEBUG')
    grid_name = '/home/staukoor/Desktop/monthlypierrick.nc.2'
    #grid_name = '/media/staukoor/Backup Plus/March2010/romsfiles/roms_avg_Y2010M3.nc.2'
    #grid_name = '/home/staukoor/Desktop/apri99pa.nc.2'

    lon_name, lat_name = 'lon_rho', 'lat_rho'

#To AntSimin, if this is file1,please use s_rho=0
#If this is file2,please use s_rho=59
    s_rho = 59
    #s_rho=59

    # domain grid points: x1, x2, y1, y2
    x1, x2 = 350, 480 #1320
    y1, y2 = 200, 375

        #x1, x2 = 0, 3001
        #y1, y2 = 0, 4001

    for time in range(0,11,1):
        dtfile = 1

        tfile = 1*1//dtfile

        ###########
        realyear_origin = datetime(2010,1,1)
        date = realyear_origin + timedelta(days=float(time)*dtfile/1.)

        infiletime = time%tfile
        filetime = time - time%tfile

        '''
        print('infiletime: ', infiletime)
        print('filetime:  ', filetime)
        print('time: ', time)
        print('tfile: ',tfile)
        print('date:',date)
        '''
        #h = RomsDataset(grid_name, lon_name, lat_name, indexs=dict(time=time, s_rho=s_rho, xi_rho=slice(350, 480), xi_u=slice(350, 479), eta_rho=slice(200, 375), eta_v=slice(200, 374)))

        h = RomsDataset(grid_name, lon_name, lat_name, 
               indexs=dict(
                     time=time,
                     eta_rho=slice(y1, y2),
                     xi_rho=slice(x1, x2),
                     eta_v=slice(y1, y2-1),
                     xi_u=slice(x1, x2-1),
                     lon_rho=slice(y1, y2),
                     lat_rho=slice(x1, x2),
                     lon_v=slice(y1, y2-1),
                     lat_u=slice(y1, y2),
                     lon_u=slice(x1, x2-1),
                     lat_v=slice(x1, x2),
                     s_rho=s_rho))        

        a, c = h.eddy_identification('zeta', 'u', 'v', date, 0.002, pixel_limit=(10, 2000), shape_error=55, force_height_unit='m',force_speed_unit='m/s')

        with Dataset(date.strftime('/home/staukoor/Desktop/eddytrial4/Anticyclonic_%Y%m%d.nc'), 'w') as h:
            a.to_netcdf(h)
        with Dataset(date.strftime('/home/staukoor/Desktop/eddytrial4/Cyclonic_%Y%m%d.nc'), 'w') as h:
            c.to_netcdf(h)

        #plot
        fig = plt.figure(figsize=(7,7))
        ax = fig.add_axes([.03,.03,.94,.94])
        ax.set_title('Eddies detected -- Cyclonic(red) and Anticyclonic(blue)')
        # ax.set_ylim(-40,75)
        ax.set_aspect('equal')
        a.display(ax, color='b', linewidth=.5)
        c.display(ax, color='r', linewidth=.5)
        ax.grid()
        #fig.savefig('/media/staukoor/Expansion/eddies_trial/eddies1_' + str(filetime) + '.png')
Sheveenah commented 2 years ago

Hello Antoine, firstly, please ignore the code above. I am sorry for the confusion, but this afternoon, I was able to detect after slicing it to a smaller grid, and run the yaml files. Although, it is not as satisfactory as I would want it. You see, even with the slicing, e.g in this code above, if I set my grid to a smaller one, it doesn't detect it. I had to really keep a larger grid, which is not ideal.

Which brings me rather to my 2nd option question. (refer to 2nd comment in comments above)

After running my yaml files and obtaining the cyclonic.nc and anticyclonic.nc files, is there a way for me to filter the longitude, latitude and focus on a smaller grid?

If you go to my 2nd comment where I mentioned 2nd option and I shared a script from my notebook where I had tried first to set my unwanted lon, lat as nan, but you advised me not to do this.

so... I hope this would work... cause I only learnt about py eddy tracker 4 weeks ago, and I really want to use this method in my research chapter :) But I need to make sure I'm doing the right thing haha

AntSimi commented 2 years ago

Roms grid subset

ValueError problem is due too because you specify xi_u and eta_v in two different place, RomsDataset.init_speed_coef method and at the init of this class.

ValueError: operands could not be broadcast together with shapes (578,130) (175,674) 

I think try to remove indexs in this method:

    def init_speed_coef(self, uname, vname):
        # xi_u and eta_v must be specified because this dimension are not use in lon/lat
        u = self.grid(uname)
        v = self.grid(vname)
        u = self.rho_2d(u.T).T
        v = self.rho_2d(v)
        self._speed_norm = (v ** 2 + u ** 2) ** .5

Atlas area selection

You could keep only eddies in a specific area with extract_with_area method used in this example