Closed meiertgrootes closed 4 years ago
The co-clustering implementation entails two loops:
The current implementation involves two parallelisation/distribution layers:
In more detail, the Dask implementation in the notebook involves data and computations in two states (see here for details): i) lazy (delayed) tasks and ii) tasks that are running in the distributed memory (future objects). Lazy tasks are stored in a graph, which grows each time an operation is performed on a Dask collection (e.g. Dask arrays). When an array’s persist
method is called, the graph is run up to the top-most elements, which are converted into local futures that points to the actual data in the distributed memory.
The original version of the notebook and a simplified co-clustering implementation for the toy model is available here: https://github.com/phenology/hsr-phenological-modelling/tree/co_clustering/co-clustering/notebooks
Communicated to Serkan & Raul
As team atlas working on the phenology co-clustering notebooks, we would like to analyse the notebooks in detail (with focus on the implementation of the algorithm and its deployment and usage with dask). This will enable us to improve and robustly run the co-clustering analysis with dask