Closed ScottWales closed 5 years ago
Hopefully this solves #107
@rabernat - are you able to test and review this please?
Thanks @ScottWales. I tried it and it seems to work. @ajdawson, why do most of travis-ci tests fail?
The test failure is, as far as I can tell, unrelated to this PR
__________ TestConstructorStandard.test_input_with_all_missing_values __________
self = <eofs.tests.test_error_handling.TestConstructorStandard object at 0x7f625167f6d8>
def test_input_with_all_missing_values(self):
# input with only missing values, missing values are propagated
# because the default for the center argument is True
solution = reference_solution('standard', 'equal')
data = solution['sst']
mask = data.mask
mask[-1] = True
with pytest.raises(ValueError):
> solver = self.solver_class(data)
E Failed: DID NOT RAISE <class 'ValueError'>
edit: that's not true, there are also tests failing because dask is not available in some of the travis envs.
@ajdawson, if this looks OK then I could add some mention to dask
in the documentation.
Looks good now @ScottWales. Can I ask you to rebase on the current master (which I just updated)? I had forgotten to merge in the 1.3 development branch back in to master after the last release, which included an upgrade of the CI testing to have more coverage and test on more recent versions of Python. I'd just like to see the updated test system pass for this PR, otherwise good to merge.
Thanks @ajdawson, rebase now done
Thanks @ScottWales, this is a nice update. And thanks @rabernat and @navidcy for testing/review.
@ajdawson, if this looks OK then I could add some mention to dask in the documentation.
@navidcy - some mention of dask in the documentation should definitely be made now. If you would like to have a go I will happily review.
The Dask array library includes a parallel SVD solver. Use this to compute the EOF, provided the input data is a Dask array (e.g. from using
xarray.open_dataset(..., chunks={'time': 10})
)Dask arrays are only used for the input to the EOF calculation. Because of the way that masking works the output from the SVD solver is converted into normal Numpy arrays. There should still be some benefit from the parallel SVD algorithm that Dask uses, though I've not measured performance.
Both the standard and Xarray solvers are supported. Iris can in theory use Dask, but I couldn't see how to set up test data for it.