Closed rstofi closed 6 months ago
I've been been meaning to do this. On it!
Can you please try the apptainer branch of stimela.
NOTE: I'm getting unpredictable behaviour when Stimela pulls with apptainer images on-the-fly, I suggest pulling everything before running CARACal.
Also, please delete images downloaded with singularity before using apptainer.
The master branch of stimela now supports apptainer. Please test, and I'll make a release soon after
Hi @SpheMakh 👋
This has suddenly become very relevant to me -- we've had a cluster software update that means we now need to use apptainer
instead of singularity
-- so I'm getting involved in this discussion. I've re-installed CARACal
in a virtual environment, but I notice that when I try to run the pipeline, the working directory doesn't seem to get binded (bound?) in the same way as it used to with singularity.
I'm setting up and launching as follows:
source /homes/riseley/caracal/bin/activate
apptainer-setup
caracal -ct singularity -c caracal_initcal_fullStokes.yml
which has worked previously, albeit loading singularity
rather than apptainer
. However, the pipeline fails as it seems to be unable to find the MS, which exists in the rawdatadir
I've defined in the config file.
I suppose my question is: is there (a) something "different" about how I'm supposed to initialise/launch CARACal
when using apptainer
instead of singularity
, or (b) nothing different, in which case this may be a "my compute cluster" problem ?
Hi,
We have a "working" solution (see my initial comment) on our local cluster. We have a singularity module (v3.8.1). However, to make it work, we have to set the SINGULARITY_TMPDIR
env variable. Plus, we need to bind our working directory (tough this is probably how the file system is partitioned and configured). And so, we have the following setup now:
module load singularity/v3.8.1
echo $SINGULARITY_TMPDIR
mkdir -p $SINGULARITY_TMPDIR
singularity exec --bind /project/MeerKAT:/MeerKAT ${CONTAINER} caracal -c flagging.yml -ct singularity -sid /MeerKAT/singularity_containers/stimela_images_for_caracal/
Not sure if this is something you can use on your server, but I hope this can be helpful!
Thanks @rstofi ... I was wrong previously, digging deeper into the error there's some FATAL error above in the traceback, and in fact it doesn't seem to be to do with the directory being bound properly. Rather it's a problem of the overlay mounting :
...
2023-10-26 12:44:40 CARACal INFO: getdata: initializing
2023-10-26 12:44:40 CARACal INFO: getdata: running
2023-10-26 12:44:40 CARACal INFO: getdata: finished
2023-10-26 12:44:40 CARACal INFO: obsconf: initializing
2023-10-26 12:44:40 CARACal.Stimela.listobs-ms0 INFO: job started at 2023-10-26 12:44:40.243382
# INFO: gocryptfs not found, will not be able to use gocryptfs
# INFO: fuse-overlayfs not found, will not be able to use overlay
# FATAL: container creation failed: mount hook function failure: mount overlay->/iranet_test/soft/apptainer/apptainer-1.2.3/var/apptainer/mnt/session/final error: while mounting overlay: can't mount overlay filesystem to /iranet_test/soft/apptainer/apptainer-1.2.3/var/apptainer/mnt/session/final: operation not permitted
2023-10-26 12:44:40 CARACal.Stimela.listobs-ms0 ERROR: /iranet/soft/apptainer/apptainer-1.2.3/bin/apptainer run --workdir /local/work/riseley/MeerKAT/Coma/.stimela_workdir-16983170801745548 --containall --writable-tmpfs returns error code 255
2023-10-26 12:44:40 CARACal.Stimela.listobs-ms0 ERROR: job failed at 2023-10-26 12:44:40.412642 after 0:00:00.169260
2023-10-26 12:44:40 CARACal ERROR: Job 'listobs-ms0:: Get observation information ms=1639795398.MS' failed: /iranet/soft/apptainer/apptainer-1.2.3/bin/apptainer run --workdir /local/work/riseley/MeerKAT/Coma/.stimela_workdir-16983170801745548 --containall --writable-tmpfs returns error code 255 [PipelineException]
which to me sounds like a "my cluster" problem rather than a CARACal
problem. Will report back if I have more.
Hi @AstroRipples
Are you still experiencing this issue?
Hi @Athanaseus 👋
Thanks for checking in, it's been a while. To keep a long story short, I'm no longer experiencing this problem and my CARACal workflow has resumed.
To expand a little, I think the above problem was related to my cluster rather than CARACal itself. I had experienced a variety of problems related to the switch from singularity to apptainer, then needing to rebuild my CARACal environment. That led to difficulties in creating a python3.8 environment (specifically 3.8.18 as that's the only 3.8.x version we have on our cluster), and problems with building wheels for some of the included packages, which after a lot of trial and error I was able to get around.
Now I'm back working in a functional CARACal environment and happy 🥳
Thanks for the feedback @AstroRipples.
I'm glad to hear the pipeline is operating smoothly. I'll close this issue, and if you experience others, please feel free to open.
Best regards
Hi,
I am trying to run
caracal
on a machine that hasapptainer
installed, rather thansingularity
. Sincesingularity
was rebrandedapprainer
a few years ago, some internal variables are changed as well. In particular, theSINGULARITY_*
env variables are now named asAPPTAINER_*
and the old ones are depreciated. However,caracal
still uses the old variables, in particular theSINGULARITY_TMPDIR
to convertstimela
cab images to sandboxes. My problem is that the two variables are not the same:whilst I can go over this problem by using an old
singularity
version loaded as a module and creating theSINGULARITYENV_TMPDIR
before runningcaracal
(this is due to the configuration of the hpc), my friendly system admin decided this is a no-no. And so, he is pushing me to useapptainer
in the long run.Would it be possible to add a check if
apptainer
orsingularity
installation is used and use the corresponding env variables, tocaracal
?Cheers.