Closed CKehl closed 2 years ago
Very good idea!
A few places where we can do this 1) A notebook. Advantage is that it's a nice way to mix code, text and figures. It's also in line with the other tutorials we have, and the chunking documentation thats' there now. Disadvantage is that it's more cumbersome to change/adapt 2) A section on http://oceanparcels.org/faq.html#performance (to be created). Advantage is that it's easier to update/change, but it will be more difficult to mix text/code/figures 3) ...
Another thing to do in this updating of the performance advice is the following
documentation_MPI.ipynb
notebook with performance curves for v2.1.3After having a look on the website and the documentation_MPI.ipynb
, I think that this notebook doesn't really reflect what it promises from the heading of the website. I suggest splitting it up in two notebooks - one called Splitting and Managing Field data, which holds the main parts of the current documentation_MPI.ipynb
(e.g. called field_chunking_tutorial.ipynb
), and a new, separate example for documentation_MPI.ipynb
called Parallel Kernel Optimization via MPI that really introduces and shows how to integrate MPI with some communication. Right now, if I wanted to use MPI for some simulation, the current notebook wouldn't really demonstrate to me how to use the feature.
There are several caveats on how to setup MPI, job submissions, Dask and Parcels in order to be performant and efficient. Especially after changes in https://github.com/OceanParcels/parcels/pull/719, the wiki or PARCELS page or the tuturial should be updated with that.
[x] Check if guidelines on the submission system are sufficient (e.g. explain the submission script and its parameters)
[x] Check that the MPI explanations are sufficient (e.g. mention that next to just installing mpi4py, one needs to actually run the script with mpiexec)
[x] Explain (on the surface) the relation between Numpy/SciPy, xarray and dask, and how data allocation is affected
[x] Explain how the chunking is done, how to use the field_chunksize parameter, and how to setup the (locally-operating)
dask.yaml
file[x] Update / link to the documentation in the warning messages on the field_chunksize in
NetcdfFileBuffer.__enter__()