Parallelize var_inf - Githubissues

LSSTDESC / rail_base

Base classes for RAIL

MIT License

0 stars 1 forks source link

Parallelize var_inf #46

Closed joselotl closed 9 months ago

joselotl commented 11 months ago

This should be the last summarization algorithm to be parallelized as it needs all of the p(z) to be loaded in memory at once.

sschmidt23 commented 11 months ago

We should ask Markus Rau if there are any sampling shortcuts that we can use that work while loading only subsamples of the data in memory, he may know of some statistical tricks that make that possible (I hope so, as I don't see how we can load hundreds of millions to billions of galaxies at once).

joselotl commented 11 months ago

It doesn't need to be loaded in the same node. My idea was to make sure that we have enough nodes to load all the p(z). My guess is that it will take around 10TB in total for the full catalog. It will mean to use around 20 CPU nodes from Perlmuter.