glideinWMS / glideinwms

The glideinWMS Project
http://tinyurl.com/glideinwms
Apache License 2.0
16 stars 46 forks source link

Apptainer seems to cache images still in the user home directory #455

Open mambelli opened 2 weeks ago

mambelli commented 2 weeks ago

Describe the bug Colby reported they are having some issues on Purdue Anvil where the .apptainer directory in the home dir is getting filled up with cache files (in /home/x-ob/.apptainer/cache/oci-tmp and /home/x-ob/.apptainer/cache/blob)

Here are some example files:

[x-ob@login02.anvil](mailto:x-ob@login02.anvil):[oci-tmp] $ du -h *
3.2M    1fa120974c9760d03badf8ff7aacb5b3bb55da23bea28e2f400506be2facb566
5.1G    3e6df1d51bb123027da307beb0bc1bc0854effb6bfb3a6eb356fb52323d7b50a
2.7M    593911a010c350b55c7b9099cce1cb778a0c92683bd714481765c2b352ca3618
5.1G    a4d322c0307085a2c66afb87abad86ec480c58ecc518ecade71133814a825a3f

Looking through job logs here are what we are seeing for the apptainter/singularity variables:

'APPTAINER_CACHEDIR=$(GLIDEIN_LOCAL_TMP_DIR)'
GLIDEIN_LOCAL_TMP_DIR=/tmp/glide_x-ob_X992YJ

To Reproduce Jobs running at Purdue Anvil

Expected behavior All Apptainer cache files should be in the Glidein dir in /tmp

Screenshots If applicable, add screenshots and/or console outputs to help explain your problem.

Info (please complete the following information): Stakeholders and components can be a comma-separated list or on multiple lines. If you add a new stakeholder or component, not on the sample list, add it on a line on its own.

Additional context This should have been fixed in 3.10.7, see PR #404 and issue #403 . Those variables (APPTAINER/SINGULARITY_CACHEDIR and APPTAINER/SINGULARITY_TMPDIR) are not set/modified by the site or operators. Apptainer debug messages should be visible enabling debug: To troubleshoot you can set GLIDEIN_DEBUG_OUTPUT to True and GLIDEIN_DEBUG_OPTIONS to userjob . This will add debug messages to the job stderr, including the apptainer invocation and debug messages.

The other test is to set the variables APPTAINER/SINGULARITY_CACHEDIR and APPTAINER/SINGULARITY_TMPD explicitly (using params)

mambelli commented 2 weeks ago

The singularity/apptainer debug message can be seen by adding in the <attrs> section of the entry to debug (Factory config) or of the group used to send the glideins to that entry (Frontend config):

<attr name="GLIDEIN_DEBUG_OUTPUT" value="True" type="string" const="False" glidein_publish="False" job_publish="False" parameter="True" publish="True" />
<attr name="GLIDEIN_DEBUG_OPTIONS" value="userjob,nowait,nocleanup" type="string" const="False" glidein_publish="False" job_publish="False" parameter="True" publish="True" />

Furthermore, this file prints 2 extra messages: singularity_lib.sh.txt This is a drop-in replacement for the file /var/lib/gwms-factory/web-base/singularity_lib.sh in a 3.10.7 Factory. After replacing the file, issue a Factory upgrade command.