euroargodev / argopy

A python library for Argo data beginners and experts
https://argopy.readthedocs.io
European Union Public License 1.2
184 stars 41 forks source link

New method to fetch data along a given trajectory #169

Open gmaze opened 2 years ago

gmaze commented 2 years ago

Motivation

For some Argo data analysis it could be useful to be able to fetch data around some specific locations, eg:

it could be useful to have an access point that directly fetch this.

API for new access point

It could look like this:

from argopy import DataFetcher as ArgoDataFetcher

# Default temporal distance is 'days' and radial distance unit is 'degree':
neighbor_fetcher = ArgoDataFetcher(ds='phy').around(wmo=[6903754], dt=365, dr=1)  # All float trajectory
neighbor_fetcher = ArgoDataFetcher(ds='phy').around(wmo=[6903754], cyc=[12], dt=365, dr=1)  # Single profile
neighbor_fetcher = ArgoDataFetcher(ds='phy').around(wmo=[6903754], cyc=[12,13,14], dt=365, dr=1)  # Selected profiles

# Possibly distinguish zonal and meridional distances:
neighbor_fetcher = ArgoDataFetcher(ds='ref').around(wmo=[6903754], dt=30, dx=2, dy=1) 

# Use option to change units:
neighbor_fetcher = ArgoDataFetcher(ds='ref').around(wmo=[6903754], dt=30, dx=100, dy=50, unit='km')

# Get data/index the classic way:
neighbor_ds = neighbor_fetcher.load().data

# argopy would add new variables to the fetched data, like distances to the requested reference profiles:
neighbor_ds['distance_time']
neighbor_ds['distance_radial']
neighbor_ds['distance_zonal']
neighbor_ds['distance_meridional']

This new access point, could in fact take a path or trajectory as input:

from argopy import DataFetcher as ArgoDataFetcher

# [2021 Hurricane Larry](https://www.nhc.noaa.gov/data/tcr/index.php?season=2021&basin=atl)
traj = [[-26.00,12.00,'2021-09-01 12:00:00'],[-33.00,13.00,'2021-09-02 12:00:00'],[-40.00,14.00,'2021-09-03 12:00:00'],[-45.00,16.00,'2021-09-04 12:00:00'],[-49.00,19.00,'2021-09-05 12:00:00'],[-52.00,21.00,'2021-09-06 12:00:00'],[-55.00,24.00,'2021-09-07 12:00:00'],[-57.00,27.00,'2021-09-08 12:00:00'],[-61.00,31.00,'2021-09-09 12:00:00'],[-61.00,38.00,'2021-09-10 12:00:00'],[-49.00,52.00,'2021-09-11 12:00:00']]

neighbor_fetcher = ArgoDataFetcher(ds='phy').around(path=traj, dt=5, dr=50, unit='km')

Further

Note this API could easily be plugged into the argo access point:

from argopy import DataFetcher as ArgoDataFetcher

float_fetcher = ArgoDataFetcher(ds='phy').float(6903754)
float_ds = fetcher.load().data

neighbor_ds = float_ds.argo.around(dt=365, dr=1)
neighbor_ds = float_ds.argo.around(dt=365, dx=2, dy=1) 
neighbor_ds = float_ds.argo.around(dt=365, dx=100, dy=50, unit='km')
neighbor_ds = float_ds.argo.around(dt=365, dr=1, ds='ref')  # Fetch data from the Argo CTD reference
matdever commented 2 years ago

@gmaze This would be a great feature and is actually the reason why issue #168 came up: I was querying the index file too often because I was looking for near-by floats along a specific float's trajectory by using:

argo = ArgoDataFetcher().profile(WMO,profnumber).to_xarray()

d = {'profile_number': argo.CYCLE_NUMBER,
     'PSAL': argo.PSAL,
     'TEMP': argo.TEMP,
     'PRES': argo.PRES,
     'time': argo.TIME,
     'lon': argo.LONGITUDE,
     'lat': argo.LATITUDE,
    }
df = pd.DataFrame(data=d)

argo = argo_loader.region([df.lon[0]-.5, df.lon[0]+.5, df.lat[0]-.5, df.lat[0]+.5, 1500, 2100, '2020-01-01', '2022-01-01']).to_xarray()

d = {'profile_number': argo.CYCLE_NUMBER,
     'PSAL': argo.PSAL,
     'TEMP': argo.TEMP,
     'PRES': argo.PRES,
     'time': argo.TIME,
     'lon': argo.LONGITUDE,
     'lat': argo.LATITUDE,
     'WMO': argo.PLATFORM_NUMBER
    }
sbe = pd.DataFrame(data=d)

where df.lon[0] is the longitude of the float of interest at a specific profile.

The additional variables are also a great touch, I currently add the absolute distance from the reference profile using

sbe['dist'] = hs.haversine((df.lat[0],df.lon[0]),(sbe.lat[ii],sbe.lon[ii])) 

Caveats

Some thoughts should be put into avoiding duplicates, otherwise the array could unnecessarily grow. Also, in my current approach, I have to manually remove the float of interest from the returned array using

sbe.drop(sbe[sbe['WMO'] == int(WMO)].index, inplace = True)
gmaze commented 2 years ago

Great !

Avoiding duplicates may indeed be included 👍

gmaze commented 2 years ago

I tried the following code to fetch data in a 2 years, 1x1 degree box around each profiles. Using cache is key for performances

WMO = [6903075]
profnumber = np.arange(2,10)

fetcher = ArgoDataFetcher(cache=True).profile(WMO, profnumber)
argo_ds = fetcher.to_xarray()
argo_ds = argo_ds.argo.point2profile()

fig, ax = plt.subplots(nrows=1, ncols=1)
ax.plot(argo_ds['LONGITUDE'], argo_ds['LATITUDE'], 'r+-', markersize=12)

for i_prof in argo_ds['N_PROF']:
    this_prof = argo_ds.sel(N_PROF=i_prof)
    lon = this_prof['LONGITUDE'].values[np.newaxis][0]
    lat = this_prof['LATITUDE'].values[np.newaxis][0]
    tim = this_prof['TIME'].values[np.newaxis][0]
    bbox = list(np.add([lon, lon, lat, lat, 1500, 2000, tim, tim], 
                       [-.5, .5, -.5, .5, 0, 0, -np.timedelta64(12*30, 'D'), np.timedelta64(12*30, 'D')]))
    bbox[-1] = pd.to_datetime(str(bbox[-1])).strftime('%Y%m%d%H%M')
    bbox[-2] = pd.to_datetime(str(bbox[-2])).strftime('%Y%m%d%H%M')
    try:
        fetcher = ArgoDataFetcher(cache=True, ds='phy').region(bbox)
        local_ds = fetcher.load().data
        local_ds = local_ds.argo.point2profile()
    except Exception:
        print(fetcher.uri)
        pass
    ax.plot(local_ds['LONGITUDE'], local_ds['LATITUDE'], 'k.')
    ax.hlines(bbox[2:4], bbox[0], bbox[1], color='black', linewidth=0.5)
    ax.vlines(bbox[0:2], bbox[2], bbox[3], color='black', linewidth=0.5)
    ax.text(lon, lat+0.01, '%i' % this_prof['CYCLE_NUMBER'], horizontalalignment='center')
ax.grid()
ax.set_title(WMO)

download

Clearly, the overlapping boxes is something we should avoid

quai20 commented 2 years ago

Clearly, the overlapping boxes is something we should avoid It's maybe the time to think about a "polygon" region fetching ?

ocefpaf commented 2 years ago

It's maybe the time to think about a "polygon" region fetching ?

Something like that is in the works for ERDDAP and should be in the next version. However, that won't help much here b/c we don't know when the argo ERDDAP server would be updated and argopy supports multiple sources of data that are non-ERDDAP. Anyway, just wanted to mention that at least this may be easier in the future.

gmaze commented 2 years ago

Indeed a "polygon" selection would be great also, I think argovis has already one

However, let's imagine the case where 2 profiles are far away from each other, i.e. more precisely that boxes do not overlap:

download

In this case a polygon would not be able to answer the around access point purpose

matdever commented 2 years ago

A polygon selection would be useful for some application I reckon. For the problem here (i.e., fetching data along a float's trajectory), I thought that filtering out duplicates would suffice. It would solve the issue of overlapping boxes, wouldn't it? The only problem it adds is how to reference the profiles to one another (as in, one profile could be in 2 boxes).

I tried to draw it as an example to be clearer:

IMG_2263 copy

Float 1 is easy - no problem there Float 2 is trickier. Profiles 2 and 3 could be used for Profile C and profiles 3 and 4 for profile D. If one filters out duplicates, only profiles 2, 3, and 4 will be left, but no indication to what profiles they are matched to.

Does that make sense?

gmaze commented 2 years ago

I thought that filtering out duplicates would suffice

Indeed, but at the expense of generating more data transfer than necessary, since all the post-processing/filtering would be done on the client side, I would try to avoid this

Not sure this would be much of a problem if the time range is reasonable though

gmaze commented 2 years ago

I tried to draw it as an example to be clearer:

Nice draw !! 😸

but no indication to what profiles they are matched to.

This issue would be solved by the distance metric I guess For float 2, profile 4 is too far from C, and profile 2 is too from D ...

github-actions[bot] commented 2 years ago

This issue was marked as staled automatically because it has not seen any activity in 90 days

github-actions[bot] commented 1 year ago

This issue was marked as staled automatically because it has not seen any activity in 90 days

github-actions[bot] commented 10 months ago

This issue was closed automatically because it has not seen any activity in 365 days

github-actions[bot] commented 6 months ago

This issue was marked as staled automatically because it has not seen any activity in 90 days