Closed jacksgt closed 1 year ago
Hi @jacksgt, assuming you're using v2.2.0 with all defaults, I'm able to reproduce this:
/atlas.cern.ch/repo/sw/software/23.0 # ls Athena
ls: can't open 'Athena': Input/output error
It seems the local cache is running out of space:
(cvmfs) cvmfs_lookup in parent inode: 5950 for name: Athena [06-15-2023 14:25:00 UTC]
(lru) lookup inode --> path: 5950 (hit) [06-15-2023 14:25:00 UTC]
(lru) lookup md5 --> dirent: acad9e5e6523aeaa78d5d0788262ed0f (miss) [06-15-2023 14:25:00 UTC]
(catalog) looking up '/repo/sw/software/23.0/Athena' in catalog: '/repo/sw/software/23.0' [06-15-2023 14:25:00 UTC]
(catalog) found entry '/repo/sw/software/23.0/Athena' in catalog '/repo/sw/software/23.0' [06-15-2023 14:25:00 UTC]
(cvmfs) cvmfs_opendir on inode: 9203 [06-15-2023 14:25:00 UTC]
(lru) lookup inode --> path: 9203 (miss) [06-15-2023 14:25:00 UTC]
(cvmfs) MISS 9203 - looking in inode tracker [06-15-2023 14:25:00 UTC]
(lru) insert inode --> path 9203 -> '/repo/sw/software/23.0/Athena' [06-15-2023 14:25:00 UTC]
(lru) lookup inode --> dirent: 9203 (miss) [06-15-2023 14:25:00 UTC]
(catalog) looking up '/repo/sw/software/23.0/Athena' in catalog: '/repo/sw/software/23.0' [06-15-2023 14:25:00 UTC]
(catalog) found entry '/repo/sw/software/23.0/Athena' in catalog '/repo/sw/software/23.0' [06-15-2023 14:25:00 UTC]
(lru) insert inode --> dirent: 9203 -> 'Athena' [06-15-2023 14:25:00 UTC]
(cvmfs) cvmfs_opendir on inode: 9203, path /repo/sw/software/23.0/Athena [06-15-2023 14:25:00 UTC]
(cvmfs) Add to listing: ., inode 9203 [06-15-2023 14:25:00 UTC]
(lru) lookup md5 --> dirent: 39d14008a5c5b9c7e45809a51a014812 (miss) [06-15-2023 14:25:00 UTC]
(catalog) looking up '/repo/sw/software/23.0' in catalog: '/repo/sw/software/23.0' [06-15-2023 14:25:00 UTC]
(catalog) found entry '/repo/sw/software/23.0' in catalog '/repo/sw/software/23.0' [06-15-2023 14:25:00 UTC]
(lru) insert md5 --> dirent: 39d14008a5c5b9c7e45809a51a014812 -> '23.0' [06-15-2023 14:25:00 UTC]
(cvmfs) Add to listing: .., inode 5950 [06-15-2023 14:25:00 UTC]
(catalog) load nested catalog at /repo/sw/software/23.0/Athena [06-15-2023 14:25:00 UTC]
(cache) miss ./60/620c17940f07abb77add368960016c20c75d75 (-2) [06-15-2023 14:25:00 UTC]
(cache) miss ./60/620c17940f07abb77add368960016c20c75d75 (-2) [06-15-2023 14:25:00 UTC]
(cache) downloading file catalog at atlas.cern.ch:/repo/sw/software/23.0/Athena (60620c17940f07abb77add368960016c20c75d75) [06-15-2023 14:25:00 UTC]
(cache) start transaction on ./txn/fetchFj3QSc has result 55 [06-15-2023 14:25:00 UTC]
(cache) miss: file catalog at atlas.cern.ch:/repo/sw/software/23.0/Athena (60620c17940f07abb77add368960016c20c75d75) /data/60/620c17940f07abb77add368960016c20c75d75C [06-15-2023 14:25:00 UTC]
(download) escaped http://cvmfs-stratum-one.cern.ch/cvmfs/atlas.cern.ch/data/60/620c17940f07abb77add368960016c20c75d75C to http://cvmfs-stratum-one.cern.ch/cvmfs/atlas.cern.ch/data/60/620c17940f07abb77add368960016c20c75d75C [06-15-2023 14:25:00 UTC]
(download) Verify downloaded url /data/60/620c17940f07abb77add368960016c20c75d75C, proxy http://188.184.28.244:3128 (curl error 0) [06-15-2023 14:25:06 UTC]
(cache) finished downloading of /data/60/620c17940f07abb77add368960016c20c75d75C [06-15-2023 14:25:06 UTC]
(cache) commit ./60/620c17940f07abb77add368960016c20c75d75 ./txn/fetchFj3QSc [06-15-2023 14:25:06 UTC]
(quota) pin into lru 60620c17940f07abb77add368960016c20c75d75, path file catalog at atlas.cern.ch:/repo/sw/software/23.0/Athena (60620c17940f07abb77add368960016c20c75d75) [06-15-2023 14:25:06 UTC]
(quota) received command 2 [06-15-2023 14:25:06 UTC]
(quota) reserve 728374272 bytes for 60620c17940f07abb77add368960016c20c75d75 [06-15-2023 14:25:06 UTC]
(quota) failed to insert 60620c17940f07abb77add368960016c20c75d75 (pinned), no space [06-15-2023 14:25:06 UTC]
(cache) commit failed: cannot pin 60620c17940f07abb77add368960016c20c75d75 [06-15-2023 14:25:06 UTC]
(catalog) failed to load catalog '/repo/sw/software/23.0/Athena' (2 - not enough space to load catalog) [06-15-2023 14:25:06 UTC]
The default local cache limit is 1000MiB, which is being set in your cvmfs-csi-default-local
ConfigMap. It is worth noting that the volume where the cache is stored is an emptyDir
.
There are two ways to handle this:
cache.local.cvmfsQuotaLimit
to a higher value (and possibly set cache.local.volumeSpec
to a hostPath
nodeplugin.extraMounts
and nodeplugin.automount.extraVolumeMounts
, and set CVMFS_CACHE_BASE
accordingly)). 2000MiB seems to be enough, but with multiple users accessing other large repos this might not be enough still -- needs more testing.cache.alien
. Beware that there is currently no mechanism in place that would automatically clean up the cache volume when it becomes full.By the way, you can get this log output by setting logLevelVerbosity
to at least 5
in the chart, and set CVMFS_DEBUGLOG
parameter in your client config (either in the global config via the cvmfs-csi-default-local
ConfigMap, or a separate client config with the clientConfig
volume parameter).
Hi Robert, thanks for the excellent troubleshooting and workaround suggestions! It seems like on LXPLUS they're using a higher quota limit (default is 1000):
[lxplus924 ~]$ grep CVMFS_QUOTA_LIMIT /etc/cvmfs/default.local
CVMFS_QUOTA_LIMIT='20000'
What exactly is this quota limit? Does it apply to all CVMFS mounts or is it per repository?
Answering my own question:
What exactly is this quota limit? Does it apply to all CVMFS mounts or is it per repository?
CVMFS docs say:
Each repository can either have an exclusive cache or join the CernVM-FS shared cache. The shared cache enforces a common quota for all repositories used on the host. File duplicates across repositories are stored only once in the shared cache. The quota limit of the shared directory should be at least the maximum of the recommended limits of its participating repositories.
and also:
Once the quota limit is reached, CernVM-FS will automatically remove files from the cache according to the least recently used policy. Removal of files is performed bunch-wise until half of the maximum cache size has been freed. The quota limit can be set in Megabytes by CVMFS_QUOTA_LIMIT. For typical repositories, a few Gigabytes make a good quota limit.
https://cvmfs.readthedocs.io/en/stable/cpt-configure.html#cache-settings
Hi,
we have a case where a user tries to access a large directory (200+GB), but is fails with a generic error message. Reproducer:
Logs from the nodeplugin:
Please advise how to troubleshoot the issue further.