GliderToolsCommunity / GliderTools

A toolkit for processing Seaglider base station NetCDF files: despiking, smoothing, outlier detection, backscatter, fluorescence quenching, calibration, gridding, interpolation.
https://glidertools.readthedocs.io
GNU Affero General Public License v3.0
69 stars 28 forks source link

Cannot merge data #181

Closed LewisDrysdale closed 1 year ago

LewisDrysdale commented 1 year ago

I am suddenly unable to merge any seaglider data. This includes data that I have previously successfully merged. I'm not a Pandas expert so I'm finding this quite difficult to debug. I thought it may be an issue with some of the glider files but I've been sub setting the files randomly and it is still an issue. Any help would be much appreciated, or to know if anyone else has experience this issue?

I have gone back to follow the exact instructions from the Glider Tools docs

Screenshot example:

image

Environment.yml file I am using

name: oceanglider channels:

callumrollo commented 1 year ago

Hi Lewis. Are you able to run the example in the demo notebooks with the Seaglider files that ship with the package? If not, what error message do you get there?

What are the dimensions of the unmerged dataframes that it returns?

LewisDrysdale commented 1 year ago

Hi Callum, thanks for getting back to me. I'm partly convinced it must be something to do with my data, but I just cant get my head around why,.

The test data run fine - i downloaded a few of the files from GitHub repo

image

The dimensions are sg_data_point

image

LewisDrysdale commented 1 year ago

You can find some of the data I am working with here https://thredds.sams.ac.uk/thredds/catalog/sg534/mission_5/catalog.html

callumrollo commented 1 year ago

Ah that makes sense! If the only dimension is sg_point, there's nothing to merge. If you remove 'eng_qsp_PARuV' from the list of variables in the example notebook you'll get the same result.

You should be good to use ds_dict['sg_data_point'] as your merged dataframe for GliderTools functions

LewisDrysdale commented 1 year ago

Ah, OK. So, I already have a merged dataframe - just not on time. I can see that now. I still can't understand why I was able to have merged as my datafram before. I think perhaps because I was using some variable that used GPS time. I need to understand more about Pandas.

Thanks very much, I really appreciate the help.

callumrollo commented 1 year ago

Yep, I think that's the case. If you want to get a time-indexed pandas dataframe from the xarray dataset returned by the load merge function, you can do something like:

merged = ds_dict['sg_data_point']
df = merged.to_pandas()
df.index = df["ctd_time_dt64"]
df = df.drop("ctd_time_dt64", axis=1)