caracal-pipeline / caracal

Containerized Automated Radio Astronomy Calibration (CARACal) pipeline
GNU General Public License v2.0
28 stars 6 forks source link

apptainer support #1508

Closed rstofi closed 6 months ago

rstofi commented 1 year ago

Hi,

I am trying to run caracal on a machine that has apptainer installed, rather than singularity. Since singularity was rebranded apprainer a few years ago, some internal variables are changed as well. In particular, the SINGULARITY_* env variables are now named as APPTAINER_* and the old ones are depreciated. However, caracal still uses the old variables, in particular the SINGULARITY_TMPDIR to convert stimela cab images to sandboxes. My problem is that the two variables are not the same:

 # INFO:    Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred

whilst I can go over this problem by using an old singularity version loaded as a module and creating the SINGULARITYENV_TMPDIR before running caracal (this is due to the configuration of the hpc), my friendly system admin decided this is a no-no. And so, he is pushing me to use apptainer in the long run.

Would it be possible to add a check if apptainer or singularity installation is used and use the corresponding env variables, to caracal?

Cheers.

SpheMakh commented 1 year ago

I've been been meaning to do this. On it!

SpheMakh commented 1 year ago

Can you please try the apptainer branch of stimela.

NOTE: I'm getting unpredictable behaviour when Stimela pulls with apptainer images on-the-fly, I suggest pulling everything before running CARACal.

SpheMakh commented 1 year ago

Also, please delete images downloaded with singularity before using apptainer.

SpheMakh commented 1 year ago

The master branch of stimela now supports apptainer. Please test, and I'll make a release soon after

AstroRipples commented 9 months ago

Hi @SpheMakh 👋

This has suddenly become very relevant to me -- we've had a cluster software update that means we now need to use apptainer instead of singularity -- so I'm getting involved in this discussion. I've re-installed CARACal in a virtual environment, but I notice that when I try to run the pipeline, the working directory doesn't seem to get binded (bound?) in the same way as it used to with singularity.

I'm setting up and launching as follows:

source /homes/riseley/caracal/bin/activate
apptainer-setup

caracal -ct singularity -c caracal_initcal_fullStokes.yml

which has worked previously, albeit loading singularity rather than apptainer. However, the pipeline fails as it seems to be unable to find the MS, which exists in the rawdatadir I've defined in the config file.

I suppose my question is: is there (a) something "different" about how I'm supposed to initialise/launch CARACal when using apptainer instead of singularity, or (b) nothing different, in which case this may be a "my compute cluster" problem ?

rstofi commented 9 months ago

Hi,

We have a "working" solution (see my initial comment) on our local cluster. We have a singularity module (v3.8.1). However, to make it work, we have to set the SINGULARITY_TMPDIR env variable. Plus, we need to bind our working directory (tough this is probably how the file system is partitioned and configured). And so, we have the following setup now:

module load singularity/v3.8.1

echo $SINGULARITY_TMPDIR
mkdir -p  $SINGULARITY_TMPDIR

singularity exec --bind /project/MeerKAT:/MeerKAT ${CONTAINER} caracal -c flagging.yml -ct singularity -sid /MeerKAT/singularity_containers/stimela_images_for_caracal/

Not sure if this is something you can use on your server, but I hope this can be helpful!

AstroRipples commented 9 months ago

Thanks @rstofi ... I was wrong previously, digging deeper into the error there's some FATAL error above in the traceback, and in fact it doesn't seem to be to do with the directory being bound properly. Rather it's a problem of the overlay mounting :

...
2023-10-26 12:44:40 CARACal INFO: getdata: initializing
2023-10-26 12:44:40 CARACal INFO: getdata: running
2023-10-26 12:44:40 CARACal INFO: getdata: finished
2023-10-26 12:44:40 CARACal INFO: obsconf: initializing
2023-10-26 12:44:40 CARACal.Stimela.listobs-ms0 INFO: job started at 2023-10-26 12:44:40.243382
# INFO:    gocryptfs not found, will not be able to use gocryptfs
# INFO:    fuse-overlayfs not found, will not be able to use overlay
# FATAL:   container creation failed: mount hook function failure: mount overlay->/iranet_test/soft/apptainer/apptainer-1.2.3/var/apptainer/mnt/session/final error: while mounting overlay: can't mount overlay filesystem to /iranet_test/soft/apptainer/apptainer-1.2.3/var/apptainer/mnt/session/final: operation not permitted
2023-10-26 12:44:40 CARACal.Stimela.listobs-ms0 ERROR: /iranet/soft/apptainer/apptainer-1.2.3/bin/apptainer run --workdir /local/work/riseley/MeerKAT/Coma/.stimela_workdir-16983170801745548 --containall --writable-tmpfs returns error code 255
2023-10-26 12:44:40 CARACal.Stimela.listobs-ms0 ERROR: job failed at 2023-10-26 12:44:40.412642 after 0:00:00.169260
2023-10-26 12:44:40 CARACal ERROR: Job 'listobs-ms0:: Get observation information ms=1639795398.MS' failed: /iranet/soft/apptainer/apptainer-1.2.3/bin/apptainer run --workdir /local/work/riseley/MeerKAT/Coma/.stimela_workdir-16983170801745548 --containall --writable-tmpfs returns error code 255 [PipelineException]

which to me sounds like a "my cluster" problem rather than a CARACal problem. Will report back if I have more.

Athanaseus commented 6 months ago

Hi @AstroRipples

Are you still experiencing this issue?

AstroRipples commented 6 months ago

Hi @Athanaseus 👋

Thanks for checking in, it's been a while. To keep a long story short, I'm no longer experiencing this problem and my CARACal workflow has resumed.

To expand a little, I think the above problem was related to my cluster rather than CARACal itself. I had experienced a variety of problems related to the switch from singularity to apptainer, then needing to rebuild my CARACal environment. That led to difficulties in creating a python3.8 environment (specifically 3.8.18 as that's the only 3.8.x version we have on our cluster), and problems with building wheels for some of the included packages, which after a lot of trial and error I was able to get around.

Now I'm back working in a functional CARACal environment and happy 🥳

Athanaseus commented 6 months ago

Thanks for the feedback @AstroRipples.

I'm glad to hear the pipeline is operating smoothly. I'll close this issue, and if you experience others, please feel free to open.

Best regards