NLeSC / team-atlas

1 stars 0 forks source link

Prepare a CGC tutorial #174

Open yifatdzigan opened 3 years ago

yifatdzigan commented 3 years ago

Acceptance Criteria


Tasks


Links to info and code

fnattino commented 3 years ago

A draft tutorial has been created in a repository in the escience academy organization, which contains a Jupiter notebook that illustrates how to use CGC for an example data set.

The repository has a webpage on readthedocs, where the notebook is rendered as a static page. The tutorial webpage can be reached via a link from the CGC documentation.

Both the GitHub repository and the readthedocs page include binder links, so that one can run the notebook in a live session in the cloud without having to care about installations/dependencies.

rogerkuou commented 3 years ago

Hi @fnattino Francesco, I had a look at the tutorial, it looks really nice! I think the storyline is super clear.

I ran through the Notebook on mybinder. Basciall the first built took super long (>10min), but after that it was built quite fast.

In general just several small remarks:

  1. Should we consider also add some memory profile to demonstrate the effect of the "low mem" option? just to add some command line notes of the memory usage. Something like memory-profiler as in this tutorial may be quite easy to implement.
  2. As Meiert suggested maybe we should include a disclaimer on our choice of spatial- temporal- cluster number?
  3. For K-means, I think we can add a little bit more description on the manual choice of k. Because that L-curve is meant for that. I can add this part. will make a PR.

We can discuss this next week. Thanks for all the good work!

fnattino commented 3 years ago

Thanks for the great suggestions, @rogerkuou and sorry for the late answer. Also thanks for sharing the tutorial with the memory_profiler usage via Jupyter magic, it's really cool! Unfortunately, due to the small size of the matrix and the low number of clusters that we need to use in order to be able to run the notebook on mybinder.org, one cannot see a difference between the two approaches. Thus, I have just added a comment explaining how one could implement such memory-usage comparison with larger datasets. For the other comments, I have implemented them. I will now write to Raul to get his opinion on the tutorial!