LSSTDESC / firecrown

DESC Cosmology Likelihood Framework
BSD 3-Clause "New" or "Revised" License
29 stars 7 forks source link

Running test sampler seems to take too much memory #177

Closed jpratmarti closed 1 year ago

jpratmarti commented 2 years ago

When running firecrown via cosmosis using just a test sampler the computing system midway from the Chicago cluster sends the following message:

Connection to [midway2.rcc.uchicago.edu](http://midway2.rcc.uchicago.edu/) closed by remote host.

We believe it is doing that because it is using too much memory, and we get kicked out unless we use an interactive node, but we don't understand why it would need so much memory. This behaviour makes it unpractical to run simple tests on the login nodes.

joezuntz commented 2 years ago

You could can use the --mem flag in CosmoSIS to help track this down. --mem 2 would print out memory usage every 2 seconds.

One thing to make sure is that OMP_NUM_THREADS is set. I could imagine some systems kicking you out for using too many processes at once, though I've never seen it in practice.

marcpaterno commented 2 years ago

@jpratmarti do you object to closing this issue? It seems to be related to use of CosmoSIS, rather than to firecrown itself.

joezuntz commented 2 years ago

I don't think this is a cosmosis issue directly - no one has reported this memory issue with other modules. But the mem flag should help track this down for certain.

vitenti commented 1 year ago

Closed for lack of feedback.