Open nmalan opened 4 years ago
Hi Neil,
Thanks for the interest and the nice words.
I will not say that I am actively developing xroms. It already does very much of what I need, making it easier to use ROMS with the fantastic xarray framework. However, I try to add to it when need arise. I also plan to increase the set of examples with interesting use cases,
(contributions are welcome).
A multifile option would be an interesting add-on. To not mix up what's working and to be consistent with xarray I would like to have an xroms_mfdataset function.
The naive approach, to simply replace the call to xr.open_dataset with xr.open_mfdataset does not work, scalar variables like Vtransform are
turned in to arrays (one elt. per file). I see two ways forward,
1) find a good set of options to open_mfdataset so it does what I want.
2) use xr.open_mfdataset and modify the resulting Dataset
The problem has caught my interest, so I will experiment further.
I prefer the first alternative if feasible, but the second should be work.
I am working from home these days and don't have a huge multifile
dataset on my laptop, so testing will be done on a smallish two-file dataset.
Regards,
Bjørn
Fra: Neil Malan notifications@github.com Sendt: onsdag 8. april 2020 02.11.52 Til: bjornaa/xroms Kopi: Subscribed Emne: [bjornaa/xroms] support from mfdataset (#1)
Hi Bjorn
First off - great package! I was wondering if you are still actively working with it? I've been fiddling around to try and make xroms able to load multiple datasets using open_mfdataset, with the idea that you could call multiple files from the same run, i.e. A = xroms.roms_dataset("outer_avg_090*.nc") However, I run into trouble with making it work with the routines in depth.py.
Do you have suggestions on how to go about loading multiple files into a single xroms dataset - Am I on the right track in trying to incorporate ope_mfdataset, or will this just never work in terms of memory contraints?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/bjornaa/xroms/issues/1, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABRRPJ3VSP4F43MKKMQA473RLO6MRANCNFSM4MDQVY6Q.
Hi again,
It was easier than I tought. The solution was to use the optional argument
data_vars='minimal' to xr.open_mfdataset. This causes non-time-dependent variables to not get the ocean_time dimension. The rest of the
roms_dataset function seems to work identically, making it possible to
merge the multi-behaviour into the same function. Presently it is a separate
function xroms.roms_mfdataset().
I would like to do more testing before submitting it to github. I would be
grateful if you would test it on a larger dataset. The replacement xroms/xroms.py module is attached.
Bjørn
Fra: Neil Malan notifications@github.com Sendt: onsdag 8. april 2020 02.11.52 Til: bjornaa/xroms Kopi: Subscribed Emne: [bjornaa/xroms] support from mfdataset (#1)
Hi Bjorn
First off - great package! I was wondering if you are still actively working with it? I've been fiddling around to try and make xroms able to load multiple datasets using open_mfdataset, with the idea that you could call multiple files from the same run, i.e. A = xroms.roms_dataset("outer_avg_090*.nc") However, I run into trouble with making it work with the routines in depth.py.
Do you have suggestions on how to go about loading multiple files into a single xroms dataset - Am I on the right track in trying to incorporate ope_mfdataset, or will this just never work in terms of memory contraints?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/bjornaa/xroms/issues/1, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABRRPJ3VSP4F43MKKMQA473RLO6MRANCNFSM4MDQVY6Q.
Hi Bjorn
Thanks for the very quick response, I'm glad that you are still active on this. That is a very neat solution for the implementation of mf_dataset. This afternoonI will test it on a big dataset (20 years of a grid 270x317) and see how it goes.
The only issue is that I can't seem to see the file you attached?
Hi Neil,
Sorry for late answer, I have not checked my e-mail during my easter vacation
(this is a big deal in Norway, even in these Corona times).
It is not the first time I forgot an attachement
Bjørn
Fra: Neil Malan notifications@github.com Sendt: torsdag 9. april 2020 01.22 Til: bjornaa/xroms Kopi: Ådlandsvik, Bjørn; Comment Emne: Re: [bjornaa/xroms] support from mfdataset (#1)
Hi Bjorn
Thanks for the very quick response, I'm glad that you are still active on this. That is a very neat solution for the implementation of mf_dataset. This afternoonI will test it on a big dataset (20 years of a grid 270x317) and see how it goes.
The only issue is that I can't seem to see the file you attached?
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/bjornaa/xroms/issues/1#issuecomment-611243428, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABRRPJ4ADEOSHP7AUMOE6ETRLUBKRANCNFSM4MDQVY6Q.
Hi Bjorn, no problem - although I still can't see the attachment?
Hei Neil,
I double checked, the attachement was present in my last mail.
Maybe some strange things happens since the mail seems to go through github
(I answer to notifications@github.com).
My e-mail address for direct contact is bjorn@imr.no,
I commited the new file and pushed it to the develop branch on github.
Bjørn
Fra: Neil Malan notifications@github.com Sendt: mandag 20. april 2020 02.59 Til: bjornaa/xroms Kopi: Ådlandsvik, Bjørn; Comment Emne: Re: [bjornaa/xroms] support from mfdataset (#1)
Hi Bjorn, no problem - although I still can't see the attachment?
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/bjornaa/xroms/issues/1#issuecomment-616256745, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABRRPJ2IL6CU2EEKTJMYK7DRNONANANCNFSM4MDQVY6Q.
Hi Bjorn
Just did a little test on a larger dataset, the new mf_dataset function works fine for a single (1.6gb, 30 days of data) file, but when trying to load two of those files with:
A = xroms.roms_mfdataset("/srv/scratch/z3097808/20year_run/20year_freerun_output_NEWnci/outer_avg_014*.nc")
I get the following error: (I tried adding 'compat='override' as an argument to xr.open_mfdataset, but jsut got another error)
`--------------------------------------------------------------------------- MergeError Traceback (most recent call last)
Hi Neil,
There is a lot of possibilities for xarrays open_mfdataset
I tried
A0 = xr.open_mfdataset(roms_file, combine='by_coords', data_vars='minimal')
this also works for my small dataset, and has the advantage of not mentioning the name of the time variable (we have some old ROMS' file lying around).
Bjørn
Fra: Neil Malan notifications@github.com Sendt: mandag 20. april 2020 08.10.46 Til: bjornaa/xroms Kopi: Ådlandsvik, Bjørn; Comment Emne: Re: [bjornaa/xroms] support from mfdataset (#1)
Hi Bjorn
Just did a little test on a larger dataset, the new mf_dataset function works fine for a single (1.6gb, 30 days of data) file, but when trying to load two of those files with:
A = xroms.roms_mfdataset("/srv/scratch/z3097808/20year_run/20year_freerun_output_NEWnci/outer_avg_014*.nc")
I get the following error: (I tried adding 'compat='override' as an argument to xr.open_mfdataset, but jsut got another error)
`--------------------------------------------------------------------------- MergeError Traceback (most recent call last) in 1 # Initiate the Dataset ----> 2 A = xroms.roms_mfdataset("/srv/scratch/z3097808/20year_run/20year_freerun_output_NEWnci/outer_avg_014*.nc") 3 A
~/miniconda3/lib/python3.7/site-packages/xroms/xroms.py in roms_mfdataset(roms_file) 138 139 # Read the ROMS file --> 140 A0 = xr.open_mfdataset(roms_file, combine='nested', concat_dim="ocean_time", data_vars='minimal') 141 # Old ROMS output have dimension 'time' instead of 'ocean_time' 142 if "time" in A0.dims:
~/miniconda3/lib/python3.7/site-packages/xarray/backends/api.py in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, lock, data_vars, coords, combine, autoclose, parallel, join, attrs_file, **kwargs) 951 coords=coords, 952 ids=ids, --> 953 join=join, 954 ) 955 elif combine == "by_coords":
~/miniconda3/lib/python3.7/site-packages/xarray/core/combine.py in _nested_combine(datasets, concat_dims, compat, data_vars, coords, ids, fill_value, join) 311 coords=coords, 312 fill_value=fill_value, --> 313 join=join, 314 ) 315 return combined
~/miniconda3/lib/python3.7/site-packages/xarray/core/combine.py in _combine_nd(combined_ids, concat_dims, data_vars, coords, compat, fill_value, join) 202 compat=compat, 203 fill_value=fill_value, --> 204 join=join, 205 ) 206 (combined_ds,) = combined_ids.values()
~/miniconda3/lib/python3.7/site-packages/xarray/core/combine.py in _combine_all_along_first_dim(combined_ids, dim, data_vars, coords, compat, fill_value, join) 224 datasets = combined_ids.values() 225 new_combined_ids[new_id] = _combine_1d( --> 226 datasets, dim, compat, data_vars, coords, fill_value, join 227 ) 228 return new_combined_ids
~/miniconda3/lib/python3.7/site-packages/xarray/core/combine.py in _combine_1d(datasets, concat_dim, compat, data_vars, coords, fill_value, join) 252 compat=compat, 253 fill_value=fill_value, --> 254 join=join, 255 ) 256 except ValueError as err:
~/miniconda3/lib/python3.7/site-packages/xarray/core/concat.py in concat(objs, dim, data_vars, coords, compat, positions, fill_value, join) 133 "objects, got %s" % type(first_obj) 134 ) --> 135 return f(objs, dim, data_vars, coords, compat, positions, fill_value, join) 136 137
~/miniconda3/lib/python3.7/site-packages/xarray/core/concat.py in _dataset_concat(datasets, dim, data_vars, coords, compat, positions, fill_value, join) 356 for var in variables_to_merge: 357 result_vars[var] = unique_variable( --> 358 var, to_merge[var], compat=compat, equals=equals.get(var, None) 359 ) 360 else:
~/miniconda3/lib/python3.7/site-packages/xarray/core/merge.py in unique_variable(name, variables, compat, equals) 141 if not equals: 142 raise MergeError( --> 143 f"conflicting values for variable {name!r} on objects to be combined. " 144 "You can skip this check by specifying compat='override'." 145 )
MergeError: conflicting values for variable 'dstart' on objects to be combined. You can skip this check by specifying compat='override'.`
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/bjornaa/xroms/issues/1#issuecomment-616332522, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABRRPJYXLQ3ZPRNVIE3CA4DRNPRONANCNFSM4MDQVY6Q.
Hi Bjorn
Thanks for the hints - I've found the but that was causing an issue, which is that the files our model produces has a 'dstart' time variable, which then confuses xarray's merging functions. So what I've done is drop that variable in the initial call to open_mfdataset, i.e.:
A0 = xr.open_mfdataset(roms_file, combine='nested', concat_dim="ocean_time", data_vars='minimal', drop_variables = 'dstart')
I'll do a couple more tests and then do a pull request to see if you're happy to add to xroms.
One thing I was wondering is if it is better to exclude the problematic variable, or to specifically tell it which variables to read in (the code for this would look a bit messy, which puts me off). Let me know what you think?
Cheers,
Neil
Hi Neil,
Our ROMS history files does not have the 'dstart' variable, so I have not been in this situation.
Your version with drop_variables may be a good solution.
I tested that it does not crash if the dropped variable does not exist.
Alternatively, the roms_mfdataset (and the roms_dataset) functions may accept
extra keyword arguments and pass them on to the xarray layer.
Than roms_mfdataset(pattern, drop_variables='dstart') should work in your case.
This may be a better solution because other users (there are not many)
may have other similar problems. I will add this version to develop branch.
For your second theme. I have been trying both ways. I was working on a package,
roppy (also on github) that just used numpy and netcdf4-python. This is more along
the lines that it reads in what I thought as central variables from the file(s).
When I discovered xarray I was very impressed and started the xroms thing as a
thin layer over the xarray functionality. xarray reads everything, if not told otherwise.
I suppose it "reads" in a lazy way, does not actually read a large array into memory
if the data are not used. If this assumption is False (I will test) I will try to limit the read.
My main development focus is the particle tracking package (LADiM, also on github).
With hourly fields on a large 800 meter domain covering the Norwegian coast, it
the speed is limited by I/O. This is why python functions as well as my old Fortran
particle tracking. But, I have found both xarray and netCDF4.MFDataset too slow,
so I decided in this case to handle the multiplicity myself with netCDF4.Dataset calls.
I will continue to slowly add to the xroms package. Except for the keyword arguments,
I don't think on any new functionality (if not requested), but to fill in more examples.
The roppy package is just hanging around on github, I will try to fix bugs if something comes up.
Bjørn
Bjørn
Fra: Neil Malan notifications@github.com Sendt: torsdag 23. april 2020 06.52 Til: bjornaa/xroms Kopi: Ådlandsvik, Bjørn; Comment Emne: Re: [bjornaa/xroms] support from mfdataset (#1)
Hi Bjorn
Thanks for the hints - I've found the but that was causing an issue, which is that the files our model produces has a 'dstart' time variable, which then confuses xarray's merging functions. So what I've done is drop that variable in the initial call to open_mfdataset, i.e.: A0 = xr.open_mfdataset(roms_file, combine='nested', concat_dim="ocean_time", data_vars='minimal', drop_variables = 'dstart')
I'll do a couple more tests and then do a pull request to see if you're happy to add to xroms.
One thing I was wondering is if it is better to exclude the problematic variable, or to specifically tell it which variables to read in (the code for this would look a bit messy, which puts me off). Let me know what you think?
Cheers,
Neil
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/bjornaa/xroms/issues/1#issuecomment-618176443, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABRRPJ2NYULQ373XFRBBSX3RN7CQHANCNFSM4MDQVY6Q.
Hi again Neil,
1)
In develop, keyword arguments to roms_(mf)dataset are passed on to
the xarray open_(mf)dataset.
2) I lied about xarray reading. It reads everything into a dataset A0, but only selected variables are present in the return dataset. And only variables I cared about when writing the function.
It should be easy to modify this, to add and remove variables, but still have reasonable defaults. This could be done by giving a variables list as an argument to the roms_dataset functions. I will think about this.
Bjørn
Fra: Neil Malan notifications@github.com Sendt: torsdag 23. april 2020 06.52 Til: bjornaa/xroms Kopi: Ådlandsvik, Bjørn; Comment Emne: Re: [bjornaa/xroms] support from mfdataset (#1)
Hi Bjorn
Thanks for the hints - I've found the but that was causing an issue, which is that the files our model produces has a 'dstart' time variable, which then confuses xarray's merging functions. So what I've done is drop that variable in the initial call to open_mfdataset, i.e.: A0 = xr.open_mfdataset(roms_file, combine='nested', concat_dim="ocean_time", data_vars='minimal', drop_variables = 'dstart')
I'll do a couple more tests and then do a pull request to see if you're happy to add to xroms.
One thing I was wondering is if it is better to exclude the problematic variable, or to specifically tell it which variables to read in (the code for this would look a bit messy, which puts me off). Let me know what you think?
Cheers,
Neil
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/bjornaa/xroms/issues/1#issuecomment-618176443, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABRRPJ2NYULQ373XFRBBSX3RN7CQHANCNFSM4MDQVY6Q.
Hi Bjorn
First off - great package! I was wondering if you are still actively working with it? I've been fiddling around to try and make xroms able to load multiple datasets using open_mfdataset, with the idea that you could call multiple files from the same run, i.e. A = xroms.roms_dataset("outer_avg_090*.nc") However, I run into trouble with making it work with the routines in depth.py.
Do you have suggestions on how to go about loading multiple files into a single xroms dataset - Am I on the right track in trying to incorporate ope_mfdataset, or will this just never work in terms of memory contraints?