Project-OSmOSE / OSEkit

OSEkit is an open source suite of tools written in python and dedicated to the management and analysis of data in underwater passive acoustics.
https://osmose.ifremer.fr
Other
3 stars 0 forks source link

Auxiliary data #88

Closed gmanatole closed 11 months ago

gmanatole commented 1 year ago

New class and module to join weather and environment data to an existing dataset.

NOTEBOOK TO TEST CLASS IS AVAILABLE IN '/home/datawork-osmose/osmose-datarmor/notebooks'

Multiple scenarios :

  1. No gps data, no era data : creates dataframe from timestamps and loads lat, lon, depth (for now 'nan' because not in metadata), shore distance and bathymetry
  2. No gps data, era data : If era data is in .../auxiliary/weather/era/, a spatio temporal join is possible (different options : nearest point, interpolation, cube and cylinder average). Variables for join depend on variables downloaded from the climate data store (cds). Also joins lat, lon, depth, shore distance, bathymetry and wind fetch as well.
  3. gps data, no era data : Creates dataframe based on timestamps from gps file. gps file needs to have correct columns. Joins lat, lon, depth, shore distance and bathymetry
  4. gps data, era data : Creates full dataframe based on timestamps from gps file
Rumengol commented 1 year ago

Since it is meant to be added in the next major release of OSmOSE, the pull request should be against the develop branch rather than main.

cazaudo commented 1 year ago

hello githubers,

@gmanatole your notebook really runs on env_name = "osmose" ? i guess not

@Rumengol what means "against" in your message please ? not sure i understand

gmanatole commented 1 year ago

@cazaudo the notebook runs on osmose_dev

cazaudo commented 1 year ago

and branch auxiliary_data right ?

gmanatole commented 1 year ago

That's right.

And what @Rumengol meant is that I tried merging auxiliary_branch with main which is the sable branch as of now. I should have merged on develop and looked for conflicts with that branch as the Auxiliary class will be in version 2 that is being developed on that branch.

cazaudo commented 1 year ago

i am testing your codes on datarmor , here is a first series of questions / recommendations :

  1. you mention « OSmOSE documentation » , where do you intend to put some details on spatiotemporal join methods ? Will be needed indeed
  2. your code asks for boussole_MERMAID.nc when I got ERA5.nc from your local notebook, u10_boussole_MERMAID.npy when i got u10_ERA5.npy etc
  3. by default the audio timestamp.csv considered is the one of the wav original folder (at least for fixed hydrophones), how can we set an arbitrary timestamp.csv ?
  4. The variable name ‘np_3600_0.5_0.5_u10’ is a bit complex , i recommend keeping only u10 as name and put meta information in an associated metadata file.
  5. i also recommend to put ERA5 variables in dataset/data/auxiliary/environment/ and not in a specific folder , let's not make things too complex too soon... we will see in practice if we need individual folders

note that I have made some modifications directly on /home/datawork-osmose/osmose-datarmor/notebooks/auxiliary_variable_pkg.ipynb , be carefull it is not gitted..

cazaudo commented 11 months ago

code now working only for the case of a moving hydrophone, ie when provided a "gps.csv" file

@gmanatole at the beginning of this PR you have listed multiple scenarios, only scenario 4 is available, can you please implement the other 3 ?

gmanatole commented 11 months ago

REMARQUES

  1. date_template function ambiguous. Why is it used in the join_welch method but not in the init method when converting time to epoch ?
  2. In join_welch, interpolation is done for ttt series. Why not use self.timestamps in both cases?
  3. join_era should be applied only to welch data and not self.df. If a fixed hydrophone has for instance an original dataset with wav files longer than 1h, we will lose in precision.
  4. Bathymetry, shore distance and wind fetch not available to user yet.
cazaudo commented 11 months ago
  1. it was a quick and dirty way to retrieve timestamps from npz files , you hardcoded the date template in your code which was not working ; the timestamp in filenames of welch files not the same whether you work on original or segmented files ........ a solution would be to format original filenames as segmented ones , ie with date template "%Y%m%dT%H%M%S" , to be discussed later
  2. good shot , i changed it
  3. i did not touch the join_era() method , but if i understand correctly join_welch should be before join_era() ? and we should use the joined df in join_era() instead of individual self.timestamps (not interpolated by join_welch, same for latitude and longitude)
  4. i think there is enough for the moment, we 'll add it a bit later
gmanatole commented 11 months ago
  1. Yes the the order should be switched up. In the SES case I've been working on self.df from gps file and temp_df from welch (spl files) had the same timestep so there was no problem. By first interpolating the era data on the gps/original wav file (mobile/fixed) we run the risk of undersampling the era data if the dt in either case is greater than 1h.
cazaudo commented 11 months ago

got it , it works now by redefining the following at the beginning of join_era() self.latitude=self.df.lat self.longitude=self.df.lon self.timestamps=self.df.time dirty but i will let it like that to make it explicit new order of methods: appli_weather = Auxiliary( ... ) appli_weather.join_welch() appli_weather.join_era() appli_weather.save_aux_data()