NLeSC / team-atlas

1 stars 0 forks source link

Identify Issues in co-clustering notebook #71

Closed meiertgrootes closed 4 years ago

meiertgrootes commented 4 years ago

As team atlas working on the phenology co-clustering notebooks, we would like to analyse the notebooks in detail (with focus on the implementation of the algorithm and its deployment and usage with dask). This will enable us to improve and robustly run the co-clustering analysis with dask

fnattino commented 4 years ago

Co-clustering notebook

Implementation:

The co-clustering implementation entails two loops:

The current implementation involves two parallelisation/distribution layers:

In more detail, the Dask implementation in the notebook involves data and computations in two states (see here for details): i) lazy (delayed) tasks and ii) tasks that are running in the distributed memory (future objects). Lazy tasks are stored in a graph, which grows each time an operation is performed on a Dask collection (e.g. Dask arrays). When an array’s persist method is called, the graph is run up to the top-most elements, which are converted into local futures that points to the actual data in the distributed memory.

Issues:

fnattino commented 4 years ago

The original version of the notebook and a simplified co-clustering implementation for the toy model is available here: https://github.com/phenology/hsr-phenological-modelling/tree/co_clustering/co-clustering/notebooks

fnattino commented 4 years ago

Communicated to Serkan & Raul