ML4GW / aframev2

Detecting binary black hole mergers in LIGO with neural networks
MIT License
4 stars 14 forks source link

poetry run build-containers hanging after recent update #191

Closed VasSkliris closed 1 month ago

VasSkliris commented 1 month ago
(base) [vasileios.skliris@dgx1 aframev2]$ poetry run build-containers 
singularity build --force /home/vasileios.skliris/aframe/images/infer.sif /home/vasileios.skliris/aframev2/projects/infer/apptainer.def
singularity build --force /home/vasileios.skliris/aframe/images/plots.sif /home/vasileios.skliris/aframev2/projects/plots/apptainer.def
singularity build --force /home/vasileios.skliris/aframe/images/train.sif /home/vasileios.skliris/aframev2/projects/train/apptainer.def
singularity build --force /home/vasileios.skliris/aframe/images/data.sif /home/vasileios.skliris/aframev2/projects/data/apptainer.def
singularity build --force /home/vasileios.skliris/aframe/images/export.sif /home/vasileios.skliris/aframev2/projects/export/apptainer.def

And then it is just stuck there

wbenoit26 commented 1 month ago

I suspect this is something on the cluster end, rather than with anything that changed in the repo. I've been able to build containers this way after the update, as have some others. Could you try two things:

  1. Build on a different node of the cluster
  2. Build just one of the containers, e.g., poetry run build-containers train
wbenoit26 commented 1 month ago

Yeah, actually, I can confirm that LLO seems to not be working for building containers at the moment. Not sure why, but it's not the script - it works on LHO. You could try going through the setup steps there.

EthanMarx commented 1 month ago

@wbenoit26 Can you try using a ThreadPool instead of ProcessPool on LHO?

wbenoit26 commented 1 month ago

That gets stuck as well. Even just trying to apptainer build gets stuck at:

❯ apptainer build $AFRAME_CONTAINER_ROOT/infer.sif apptainer.def
INFO:    User not listed in /etc/subuid, trying root-mapped namespace
INFO:    The %post section will be run under fakeroot
wbenoit26 commented 1 month ago

Closing this as it seems to be cluster-specific and transient - I was able to build containers on LLO this afternoon.