dfm / tess-atlas

MIT License
9 stars 8 forks source link

Out of memory during `plot_phase` #192

Closed avivajpeyi closed 2 years ago

avivajpeyi commented 2 years ago

gosh, these incessant memory errors! It's not even during sampling but during plotting!

nbconvert.preprocessors.execute.DeadKernelError: Kernel died
slurmstepd: error: Detected 1 oom-kill event(s) in step 26320284.0 cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
srun: error: john16: task 0: Out Of Memory

Occurs at

plot_phase(tic_entry, inference_data, planet_transit_model)
dfm commented 2 years ago

Dang - this is probably related to this line:

https://github.com/dfm/tess-atlas/blob/6066e78a1356b905666df663d1680b08900b9675/src/tess_atlas/plotting/plotting_utils.py#L192

and the computation of the GP mean. I wouldn't expect it to be a huge memory cost, but I guess it is. Do you really need the GP prediction for all those samples?

avivajpeyi commented 2 years ago

https://github.com/dfm/tess-atlas/blob/d5b5ebd78b95c3681f87a122bb83959c21a880d5/src/tess_atlas/notebook_templates/toi_template.py#L345

Maybe just remove t and this might just work ^TM

avivajpeyi commented 2 years ago

do we need to compute GP so many times?

avivajpeyi commented 2 years ago

probably not

avivajpeyi commented 2 years ago

probably just the median(GP)

avivajpeyi commented 2 years ago

Offt not fixed -- this is still occurring with 1.5k jobs!

Rather frustrating...

avivajpeyi commented 2 years ago

seems to be fixed now