Add performance advices to PARCELS page or tutorial

OceanParcels / Parcels

Main code for Parcels (Probably A Really Computationally Efficient Lagrangian Simulator)

https://www.oceanparcels.org

MIT License

295 stars 136 forks source link

Add performance advices to PARCELS page or tutorial #720

Closed CKehl closed 2 years ago

CKehl commented 4 years ago

There are several caveats on how to setup MPI, job submissions, Dask and Parcels in order to be performant and efficient. Especially after changes in https://github.com/OceanParcels/parcels/pull/719, the wiki or PARCELS page or the tuturial should be updated with that.

[x] Check if guidelines on the submission system are sufficient (e.g. explain the submission script and its parameters)
[x] Check that the MPI explanations are sufficient (e.g. mention that next to just installing mpi4py, one needs to actually run the script with mpiexec)
[x] Explain (on the surface) the relation between Numpy/SciPy, xarray and dask, and how data allocation is affected
[x] Explain how the chunking is done, how to use the field_chunksize parameter, and how to setup the (locally-operating) dask.yaml file
[x] Update / link to the documentation in the warning messages on the field_chunksize in NetcdfFileBuffer.__enter__()

erikvansebille commented 4 years ago

Very good idea!

A few places where we can do this 1) A notebook. Advantage is that it's a nice way to mix code, text and figures. It's also in line with the other tutorials we have, and the chunking documentation thats' there now. Disadvantage is that it's more cumbersome to change/adapt 2) A section on http://oceanparcels.org/faq.html#performance (to be created). Advantage is that it's easier to update/change, but it will be more difficult to mix text/code/figures 3) ...

erikvansebille commented 4 years ago

Another thing to do in this updating of the performance advice is the following

[ ] Update documentation_MPI.ipynb notebook with performance curves for v2.1.3

CKehl commented 4 years ago

After having a look on the website and the documentation_MPI.ipynb, I think that this notebook doesn't really reflect what it promises from the heading of the website. I suggest splitting it up in two notebooks - one called Splitting and Managing Field data, which holds the main parts of the current documentation_MPI.ipynb (e.g. called field_chunking_tutorial.ipynb), and a new, separate example for documentation_MPI.ipynb called Parallel Kernel Optimization via MPI that really introduces and shows how to integrate MPI with some communication. Right now, if I wanted to use MPI for some simulation, the current notebook wouldn't really demonstrate to me how to use the feature.