ajdawson / eofs

EOF analysis in Python
http://ajdawson.github.io/eofs/
GNU General Public License v3.0
199 stars 60 forks source link

Add support for Dask SVD solver #109

Closed ScottWales closed 5 years ago

ScottWales commented 5 years ago

The Dask array library includes a parallel SVD solver. Use this to compute the EOF, provided the input data is a Dask array (e.g. from using xarray.open_dataset(..., chunks={'time': 10}))

Dask arrays are only used for the input to the EOF calculation. Because of the way that masking works the output from the SVD solver is converted into normal Numpy arrays. There should still be some benefit from the parallel SVD algorithm that Dask uses, though I've not measured performance.

Both the standard and Xarray solvers are supported. Iris can in theory use Dask, but I couldn't see how to set up test data for it.

ScottWales commented 5 years ago

Hopefully this solves #107

ajdawson commented 5 years ago

@rabernat - are you able to test and review this please?

navidcy commented 5 years ago

Thanks @ScottWales. I tried it and it seems to work. @ajdawson, why do most of travis-ci tests fail?

rabernat commented 5 years ago

The test failure is, as far as I can tell, unrelated to this PR

__________ TestConstructorStandard.test_input_with_all_missing_values __________
self = <eofs.tests.test_error_handling.TestConstructorStandard object at 0x7f625167f6d8>
    def test_input_with_all_missing_values(self):
        # input with only missing values, missing values are propagated
        # because the default for the center argument is True
        solution = reference_solution('standard', 'equal')
        data = solution['sst']
        mask = data.mask
        mask[-1] = True
        with pytest.raises(ValueError):
>           solver = self.solver_class(data)
E           Failed: DID NOT RAISE <class 'ValueError'>

edit: that's not true, there are also tests failing because dask is not available in some of the travis envs.

navidcy commented 5 years ago

@ajdawson, if this looks OK then I could add some mention to dask in the documentation.

ajdawson commented 5 years ago

Looks good now @ScottWales. Can I ask you to rebase on the current master (which I just updated)? I had forgotten to merge in the 1.3 development branch back in to master after the last release, which included an upgrade of the CI testing to have more coverage and test on more recent versions of Python. I'd just like to see the updated test system pass for this PR, otherwise good to merge.

ScottWales commented 5 years ago

Thanks @ajdawson, rebase now done

ajdawson commented 5 years ago

Thanks @ScottWales, this is a nice update. And thanks @rabernat and @navidcy for testing/review.

ajdawson commented 5 years ago

@ajdawson, if this looks OK then I could add some mention to dask in the documentation.

@navidcy - some mention of dask in the documentation should definitely be made now. If you would like to have a go I will happily review.