bjornaa / xroms0

Post-processing tools for ocean model ROMS, based on python and xarray
MIT License
3 stars 2 forks source link

support from mfdataset #1

Open nmalan opened 4 years ago

nmalan commented 4 years ago

Hi Bjorn

First off - great package! I was wondering if you are still actively working with it? I've been fiddling around to try and make xroms able to load multiple datasets using open_mfdataset, with the idea that you could call multiple files from the same run, i.e. A = xroms.roms_dataset("outer_avg_090*.nc") However, I run into trouble with making it work with the routines in depth.py.

Do you have suggestions on how to go about loading multiple files into a single xroms dataset - Am I on the right track in trying to incorporate ope_mfdataset, or will this just never work in terms of memory contraints?

bjornaa commented 4 years ago

Hi Neil,

Thanks for the interest and the nice words.

I will not say that I am actively developing xroms. It already does very much of what I need, making it easier to use ROMS with the fantastic xarray framework. However, I try to add to it when need arise. I also plan to increase the set of examples with interesting use cases,

(contributions are welcome).

A multifile option would be an interesting add-on. To not mix up what's working and to be consistent with xarray I would like to have an xroms_mfdataset function.

The naive approach, to simply replace the call to xr.open_dataset with xr.open_mfdataset does not work, scalar variables like Vtransform are

turned in to arrays (one elt. per file). I see two ways forward,

1) find a good set of options to open_mfdataset so it does what I want.

2) use xr.open_mfdataset and modify the resulting Dataset

The problem has caught my interest, so I will experiment further.

I prefer the first alternative if feasible, but the second should be work.

I am working from home these days and don't have a huge multifile

dataset on my laptop, so testing will be done on a smallish two-file dataset.

Regards,

Bjørn


Fra: Neil Malan notifications@github.com Sendt: onsdag 8. april 2020 02.11.52 Til: bjornaa/xroms Kopi: Subscribed Emne: [bjornaa/xroms] support from mfdataset (#1)

Hi Bjorn

First off - great package! I was wondering if you are still actively working with it? I've been fiddling around to try and make xroms able to load multiple datasets using open_mfdataset, with the idea that you could call multiple files from the same run, i.e. A = xroms.roms_dataset("outer_avg_090*.nc") However, I run into trouble with making it work with the routines in depth.py.

Do you have suggestions on how to go about loading multiple files into a single xroms dataset - Am I on the right track in trying to incorporate ope_mfdataset, or will this just never work in terms of memory contraints?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/bjornaa/xroms/issues/1, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABRRPJ3VSP4F43MKKMQA473RLO6MRANCNFSM4MDQVY6Q.

bjornaa commented 4 years ago

Hi again,

It was easier than I tought. The solution was to use the optional argument

data_vars='minimal' to xr.open_mfdataset. This causes non-time-dependent variables to not get the ocean_time dimension. The rest of the

roms_dataset function seems to work identically, making it possible to

merge the multi-behaviour into the same function. Presently it is a separate

function xroms.roms_mfdataset().

I would like to do more testing before submitting it to github. I would be

grateful if you would test it on a larger dataset. The replacement xroms/xroms.py module is attached.

Bjørn


Fra: Neil Malan notifications@github.com Sendt: onsdag 8. april 2020 02.11.52 Til: bjornaa/xroms Kopi: Subscribed Emne: [bjornaa/xroms] support from mfdataset (#1)

Hi Bjorn

First off - great package! I was wondering if you are still actively working with it? I've been fiddling around to try and make xroms able to load multiple datasets using open_mfdataset, with the idea that you could call multiple files from the same run, i.e. A = xroms.roms_dataset("outer_avg_090*.nc") However, I run into trouble with making it work with the routines in depth.py.

Do you have suggestions on how to go about loading multiple files into a single xroms dataset - Am I on the right track in trying to incorporate ope_mfdataset, or will this just never work in terms of memory contraints?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/bjornaa/xroms/issues/1, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABRRPJ3VSP4F43MKKMQA473RLO6MRANCNFSM4MDQVY6Q.

nmalan commented 4 years ago

Hi Bjorn

Thanks for the very quick response, I'm glad that you are still active on this. That is a very neat solution for the implementation of mf_dataset. This afternoonI will test it on a big dataset (20 years of a grid 270x317) and see how it goes.

The only issue is that I can't seem to see the file you attached?

bjornaa commented 4 years ago

Hi Neil,

Sorry for late answer, I have not checked my e-mail during my easter vacation

(this is a big deal in Norway, even in these Corona times).

It is not the first time I forgot an attachement

Bjørn


Fra: Neil Malan notifications@github.com Sendt: torsdag 9. april 2020 01.22 Til: bjornaa/xroms Kopi: Ådlandsvik, Bjørn; Comment Emne: Re: [bjornaa/xroms] support from mfdataset (#1)

Hi Bjorn

Thanks for the very quick response, I'm glad that you are still active on this. That is a very neat solution for the implementation of mf_dataset. This afternoonI will test it on a big dataset (20 years of a grid 270x317) and see how it goes.

The only issue is that I can't seem to see the file you attached?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/bjornaa/xroms/issues/1#issuecomment-611243428, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABRRPJ4ADEOSHP7AUMOE6ETRLUBKRANCNFSM4MDQVY6Q.

nmalan commented 4 years ago

Hi Bjorn, no problem - although I still can't see the attachment?

bjornaa commented 4 years ago

Hei Neil,

I double checked, the attachement was present in my last mail.

Maybe some strange things happens since the mail seems to go through github

(I answer to notifications@github.com).

My e-mail address for direct contact is bjorn@imr.no,

I commited the new file and pushed it to the develop branch on github.

Bjørn


Fra: Neil Malan notifications@github.com Sendt: mandag 20. april 2020 02.59 Til: bjornaa/xroms Kopi: Ådlandsvik, Bjørn; Comment Emne: Re: [bjornaa/xroms] support from mfdataset (#1)

Hi Bjorn, no problem - although I still can't see the attachment?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/bjornaa/xroms/issues/1#issuecomment-616256745, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABRRPJ2IL6CU2EEKTJMYK7DRNONANANCNFSM4MDQVY6Q.

nmalan commented 4 years ago

Hi Bjorn

Just did a little test on a larger dataset, the new mf_dataset function works fine for a single (1.6gb, 30 days of data) file, but when trying to load two of those files with:

A = xroms.roms_mfdataset("/srv/scratch/z3097808/20year_run/20year_freerun_output_NEWnci/outer_avg_014*.nc")

I get the following error: (I tried adding 'compat='override' as an argument to xr.open_mfdataset, but jsut got another error)

`--------------------------------------------------------------------------- MergeError Traceback (most recent call last)

in 1 # Initiate the Dataset ----> 2 A = xroms.roms_mfdataset("/srv/scratch/z3097808/20year_run/20year_freerun_output_NEWnci/outer_avg_014*.nc") 3 A ~/miniconda3/lib/python3.7/site-packages/xroms/xroms.py in roms_mfdataset(roms_file) 138 139 # Read the ROMS file --> 140 A0 = xr.open_mfdataset(roms_file, combine='nested', concat_dim="ocean_time", data_vars='minimal') 141 # Old ROMS output have dimension 'time' instead of 'ocean_time' 142 if "time" in A0.dims: ~/miniconda3/lib/python3.7/site-packages/xarray/backends/api.py in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, lock, data_vars, coords, combine, autoclose, parallel, join, attrs_file, **kwargs) 951 coords=coords, 952 ids=ids, --> 953 join=join, 954 ) 955 elif combine == "by_coords": ~/miniconda3/lib/python3.7/site-packages/xarray/core/combine.py in _nested_combine(datasets, concat_dims, compat, data_vars, coords, ids, fill_value, join) 311 coords=coords, 312 fill_value=fill_value, --> 313 join=join, 314 ) 315 return combined ~/miniconda3/lib/python3.7/site-packages/xarray/core/combine.py in _combine_nd(combined_ids, concat_dims, data_vars, coords, compat, fill_value, join) 202 compat=compat, 203 fill_value=fill_value, --> 204 join=join, 205 ) 206 (combined_ds,) = combined_ids.values() ~/miniconda3/lib/python3.7/site-packages/xarray/core/combine.py in _combine_all_along_first_dim(combined_ids, dim, data_vars, coords, compat, fill_value, join) 224 datasets = combined_ids.values() 225 new_combined_ids[new_id] = _combine_1d( --> 226 datasets, dim, compat, data_vars, coords, fill_value, join 227 ) 228 return new_combined_ids ~/miniconda3/lib/python3.7/site-packages/xarray/core/combine.py in _combine_1d(datasets, concat_dim, compat, data_vars, coords, fill_value, join) 252 compat=compat, 253 fill_value=fill_value, --> 254 join=join, 255 ) 256 except ValueError as err: ~/miniconda3/lib/python3.7/site-packages/xarray/core/concat.py in concat(objs, dim, data_vars, coords, compat, positions, fill_value, join) 133 "objects, got %s" % type(first_obj) 134 ) --> 135 return f(objs, dim, data_vars, coords, compat, positions, fill_value, join) 136 137 ~/miniconda3/lib/python3.7/site-packages/xarray/core/concat.py in _dataset_concat(datasets, dim, data_vars, coords, compat, positions, fill_value, join) 356 for var in variables_to_merge: 357 result_vars[var] = unique_variable( --> 358 var, to_merge[var], compat=compat, equals=equals.get(var, None) 359 ) 360 else: ~/miniconda3/lib/python3.7/site-packages/xarray/core/merge.py in unique_variable(name, variables, compat, equals) 141 if not equals: 142 raise MergeError( --> 143 f"conflicting values for variable {name!r} on objects to be combined. " 144 "You can skip this check by specifying compat='override'." 145 ) MergeError: conflicting values for variable 'dstart' on objects to be combined. You can skip this check by specifying compat='override'.`
bjornaa commented 4 years ago

Hi Neil,

There is a lot of possibilities for xarrays open_mfdataset

I tried

A0 = xr.open_mfdataset(roms_file, combine='by_coords', data_vars='minimal')

this also works for my small dataset, and has the advantage of not mentioning the name of the time variable (we have some old ROMS' file lying around).

Bjørn


Fra: Neil Malan notifications@github.com Sendt: mandag 20. april 2020 08.10.46 Til: bjornaa/xroms Kopi: Ådlandsvik, Bjørn; Comment Emne: Re: [bjornaa/xroms] support from mfdataset (#1)

Hi Bjorn

Just did a little test on a larger dataset, the new mf_dataset function works fine for a single (1.6gb, 30 days of data) file, but when trying to load two of those files with:

A = xroms.roms_mfdataset("/srv/scratch/z3097808/20year_run/20year_freerun_output_NEWnci/outer_avg_014*.nc")

I get the following error: (I tried adding 'compat='override' as an argument to xr.open_mfdataset, but jsut got another error)

`--------------------------------------------------------------------------- MergeError Traceback (most recent call last) in 1 # Initiate the Dataset ----> 2 A = xroms.roms_mfdataset("/srv/scratch/z3097808/20year_run/20year_freerun_output_NEWnci/outer_avg_014*.nc") 3 A

~/miniconda3/lib/python3.7/site-packages/xroms/xroms.py in roms_mfdataset(roms_file) 138 139 # Read the ROMS file --> 140 A0 = xr.open_mfdataset(roms_file, combine='nested', concat_dim="ocean_time", data_vars='minimal') 141 # Old ROMS output have dimension 'time' instead of 'ocean_time' 142 if "time" in A0.dims:

~/miniconda3/lib/python3.7/site-packages/xarray/backends/api.py in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, lock, data_vars, coords, combine, autoclose, parallel, join, attrs_file, **kwargs) 951 coords=coords, 952 ids=ids, --> 953 join=join, 954 ) 955 elif combine == "by_coords":

~/miniconda3/lib/python3.7/site-packages/xarray/core/combine.py in _nested_combine(datasets, concat_dims, compat, data_vars, coords, ids, fill_value, join) 311 coords=coords, 312 fill_value=fill_value, --> 313 join=join, 314 ) 315 return combined

~/miniconda3/lib/python3.7/site-packages/xarray/core/combine.py in _combine_nd(combined_ids, concat_dims, data_vars, coords, compat, fill_value, join) 202 compat=compat, 203 fill_value=fill_value, --> 204 join=join, 205 ) 206 (combined_ds,) = combined_ids.values()

~/miniconda3/lib/python3.7/site-packages/xarray/core/combine.py in _combine_all_along_first_dim(combined_ids, dim, data_vars, coords, compat, fill_value, join) 224 datasets = combined_ids.values() 225 new_combined_ids[new_id] = _combine_1d( --> 226 datasets, dim, compat, data_vars, coords, fill_value, join 227 ) 228 return new_combined_ids

~/miniconda3/lib/python3.7/site-packages/xarray/core/combine.py in _combine_1d(datasets, concat_dim, compat, data_vars, coords, fill_value, join) 252 compat=compat, 253 fill_value=fill_value, --> 254 join=join, 255 ) 256 except ValueError as err:

~/miniconda3/lib/python3.7/site-packages/xarray/core/concat.py in concat(objs, dim, data_vars, coords, compat, positions, fill_value, join) 133 "objects, got %s" % type(first_obj) 134 ) --> 135 return f(objs, dim, data_vars, coords, compat, positions, fill_value, join) 136 137

~/miniconda3/lib/python3.7/site-packages/xarray/core/concat.py in _dataset_concat(datasets, dim, data_vars, coords, compat, positions, fill_value, join) 356 for var in variables_to_merge: 357 result_vars[var] = unique_variable( --> 358 var, to_merge[var], compat=compat, equals=equals.get(var, None) 359 ) 360 else:

~/miniconda3/lib/python3.7/site-packages/xarray/core/merge.py in unique_variable(name, variables, compat, equals) 141 if not equals: 142 raise MergeError( --> 143 f"conflicting values for variable {name!r} on objects to be combined. " 144 "You can skip this check by specifying compat='override'." 145 )

MergeError: conflicting values for variable 'dstart' on objects to be combined. You can skip this check by specifying compat='override'.`

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/bjornaa/xroms/issues/1#issuecomment-616332522, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABRRPJYXLQ3ZPRNVIE3CA4DRNPRONANCNFSM4MDQVY6Q.

nmalan commented 4 years ago

Hi Bjorn

Thanks for the hints - I've found the but that was causing an issue, which is that the files our model produces has a 'dstart' time variable, which then confuses xarray's merging functions. So what I've done is drop that variable in the initial call to open_mfdataset, i.e.: A0 = xr.open_mfdataset(roms_file, combine='nested', concat_dim="ocean_time", data_vars='minimal', drop_variables = 'dstart')

I'll do a couple more tests and then do a pull request to see if you're happy to add to xroms.

One thing I was wondering is if it is better to exclude the problematic variable, or to specifically tell it which variables to read in (the code for this would look a bit messy, which puts me off). Let me know what you think?

Cheers,

Neil

bjornaa commented 4 years ago

Hi Neil,

Our ROMS history files does not have the 'dstart' variable, so I have not been in this situation.

Your version with drop_variables may be a good solution.

I tested that it does not crash if the dropped variable does not exist.

Alternatively, the roms_mfdataset (and the roms_dataset) functions may accept

extra keyword arguments and pass them on to the xarray layer.

Than roms_mfdataset(pattern, drop_variables='dstart') should work in your case.

This may be a better solution because other users (there are not many)

may have other similar problems. I will add this version to develop branch.

For your second theme. I have been trying both ways. I was working on a package,

roppy (also on github) that just used numpy and netcdf4-python. This is more along

the lines that it reads in what I thought as central variables from the file(s).

When I discovered xarray I was very impressed and started the xroms thing as a

thin layer over the xarray functionality. xarray reads everything, if not told otherwise.

I suppose it "reads" in a lazy way, does not actually read a large array into memory

if the data are not used. If this assumption is False (I will test) I will try to limit the read.

My main development focus is the particle tracking package (LADiM, also on github).

With hourly fields on a large 800 meter domain covering the Norwegian coast, it

the speed is limited by I/O. This is why python functions as well as my old Fortran

particle tracking. But, I have found both xarray and netCDF4.MFDataset too slow,

so I decided in this case to handle the multiplicity myself with netCDF4.Dataset calls.

I will continue to slowly add to the xroms package. Except for the keyword arguments,

I don't think on any new functionality (if not requested), but to fill in more examples.

The roppy package is just hanging around on github, I will try to fix bugs if something comes up.

Bjørn

Bjørn


Fra: Neil Malan notifications@github.com Sendt: torsdag 23. april 2020 06.52 Til: bjornaa/xroms Kopi: Ådlandsvik, Bjørn; Comment Emne: Re: [bjornaa/xroms] support from mfdataset (#1)

Hi Bjorn

Thanks for the hints - I've found the but that was causing an issue, which is that the files our model produces has a 'dstart' time variable, which then confuses xarray's merging functions. So what I've done is drop that variable in the initial call to open_mfdataset, i.e.: A0 = xr.open_mfdataset(roms_file, combine='nested', concat_dim="ocean_time", data_vars='minimal', drop_variables = 'dstart')

I'll do a couple more tests and then do a pull request to see if you're happy to add to xroms.

One thing I was wondering is if it is better to exclude the problematic variable, or to specifically tell it which variables to read in (the code for this would look a bit messy, which puts me off). Let me know what you think?

Cheers,

Neil

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/bjornaa/xroms/issues/1#issuecomment-618176443, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABRRPJ2NYULQ373XFRBBSX3RN7CQHANCNFSM4MDQVY6Q.

bjornaa commented 4 years ago

Hi again Neil,

1)

In develop, keyword arguments to roms_(mf)dataset are passed on to

the xarray open_(mf)dataset.

2) I lied about xarray reading. It reads everything into a dataset A0, but only selected variables are present in the return dataset. And only variables I cared about when writing the function.

It should be easy to modify this, to add and remove variables, but still have reasonable defaults. This could be done by giving a variables list as an argument to the roms_dataset functions. I will think about this.

Bjørn


Fra: Neil Malan notifications@github.com Sendt: torsdag 23. april 2020 06.52 Til: bjornaa/xroms Kopi: Ådlandsvik, Bjørn; Comment Emne: Re: [bjornaa/xroms] support from mfdataset (#1)

Hi Bjorn

Thanks for the hints - I've found the but that was causing an issue, which is that the files our model produces has a 'dstart' time variable, which then confuses xarray's merging functions. So what I've done is drop that variable in the initial call to open_mfdataset, i.e.: A0 = xr.open_mfdataset(roms_file, combine='nested', concat_dim="ocean_time", data_vars='minimal', drop_variables = 'dstart')

I'll do a couple more tests and then do a pull request to see if you're happy to add to xroms.

One thing I was wondering is if it is better to exclude the problematic variable, or to specifically tell it which variables to read in (the code for this would look a bit messy, which puts me off). Let me know what you think?

Cheers,

Neil

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/bjornaa/xroms/issues/1#issuecomment-618176443, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABRRPJ2NYULQ373XFRBBSX3RN7CQHANCNFSM4MDQVY6Q.