boutproject / xBOUT

Collects BOUT++ data from parallelized simulations into xarray.
https://xbout.readthedocs.io/en/latest/
Apache License 2.0
22 stars 10 forks source link

Allow kwargs to open_boutdataset #102

Closed TomNicholas closed 4 years ago

TomNicholas commented 4 years ago

Generalised open_boutdataset to accept arbitrary kwargs. The kwargs go to xarray.open_mfdataset first, and if they aren't recognised there they go down to xarray.open_dataset.

This allows you to potentially speed up opening many files by passing data_vars='minimal', coords='minimal', compat='override', parallel='True' as described here.

This also means the drop_variables argument doesn't need to be explicitly there anymore as it's covered by the kwargs being passed down to xarray.open_dataset.

johnomotani commented 4 years ago

How does this change eliminate the possible need for drop_variables? I guess we still need the special handling for _BOUT_PER_PROC_VARIABLES, and if so what happens if some user code happens to add some variable to the output that behaves in a similar way (e.g. a scalar with different values in each BOUT.dmp.*.nc file)? Wouldn't there be an error unless that variable is dropped explicitly?

TomNicholas commented 4 years ago

How does this change eliminate the possible need for drop_variables?

Because drop_variables is an argument to xarray.open_dataset, which kwargs will pass down to. So it's still available as an option, it just doesn't need to be explicitly listed as an argument to open_boutdataset.

what happens if some user code happens to add some variable to the output that behaves in a similar way

Because open_boutdataset occurs before preprocess or the combining process within open_mfdataset, passing drop_variables='problem_variable' to open_boutdataset should still work fine.

Wouldn't there be an error unless that variable is dropped explicitly?

Yes, but we will still have the option to drop it explicitly.

johnomotani commented 4 years ago

Ah, I see. Sorry, I'd looked at open_mfdataset and saw it didn't have a drop_variables argument, but didn't realise that open_dataset does and the kwargs will drop through to there.

kwargs are great, but they do make the help() less helpful sometimes