Open ppettina opened 3 years ago
strace -f
output with relevant filtering (-e signal=SIGCHLD -e trace=utimensat
):
...
[pid 464743] utimensat(8, NULL, [UTIME_OMIT, {tv_sec=1605788182, tv_nsec=0} /* 2020-11-19T12:16:22+0000 */], 0) = 0
[pid 464743] utimensat(8, NULL, [UTIME_OMIT, {tv_sec=1605788182, tv_nsec=0} /* 2020-11-19T12:16:22+0000 */], 0) = 0
[pid 464743] utimensat(8, NULL, [UTIME_OMIT, {tv_sec=1605788182, tv_nsec=0} /* 2020-11-19T12:16:22+0000 */], 0) = 0
[pid 464743] utimensat(8, NULL, [UTIME_OMIT, {tv_sec=1605788182, tv_nsec=0} /* 2020-11-19T12:16:22+0000 */], 0) = 0
[pid 464743] utimensat(8, NULL, [UTIME_OMIT, {tv_sec=1605788182, tv_nsec=0} /* 2020-11-19T12:16:22+0000 */], 0 <unfinished ...>
[pid 464744] +++ exited with 0 +++
<... utimensat resumed>) = -1 EINTR (Interrupted system call)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=464744, si_uid=1000, si_status=0, si_utime=26, si_stime=3} ---
tar: ffmpeg/CA5B7F464AEA83D5018DE264A411CBDA0/ffmpeg.sym: Cannot utime: Interrupted system call
utimensat(7, "ffmpeg/CA5B7F464AEA83D5018DE264A411CBDA0", [UTIME_OMIT, {tv_sec=1605788182, tv_nsec=0} /* 2020-11-19T12:16:22+0000 */], 0) = 0
utimensat(7, "ffmpeg", [UTIME_OMIT, {tv_sec=1605788182, tv_nsec=0} /* 2020-11-19T12:16:22+0000 */], 0) = 0
utimensat(8, NULL, [UTIME_OMIT, {tv_sec=1605788182, tv_nsec=0} /* 2020-11-19T12:16:22+0000 */], 0) = 0
utimensat(7, "nice/3647C7556D3C635621CA0395E129A0560", [UTIME_OMIT, {tv_sec=1605788182, tv_nsec=0} /* 2020-11-19T12:16:22+0000 */], 0) = 0
utimensat(7, "nice", [UTIME_OMIT, {tv_sec=1605788182, tv_nsec=0} /* 2020-11-19T12:16:22+0000 */], 0) = 0
tar: Exiting with failure status due to previous errors
+++ exited with 2 +++
which is consistent with SIGCHLD interrupting the utimensat syscall.
Obvious workaround (for those coming here for a solution) is splitting the call:
gzip -dc /path/to/tarball.gz | tar -x -C /some/cifs/mount
Thanks for the report. How reproducible is this issue that you are observing? And can you share some more details regarding the server-side which provides this SMB share, and how the CIFS mount is provisioned on the FCOS node?
Indeed it looks like you are hitting https://bugzilla.redhat.com/show_bug.cgi?id=1848178 (private, investigation ongoing). I don't have any timing insights to share at this point, but once it get fixed upstream we can track it and make sure it quickly reaches FCOS too.
Thanks @lucab .
Issue happens about 1 in 5 times. Note that I'm rerunning the same command over and over, thus overwriting the files - not sure if it makes a difference.
Server is running Ubuntu 16.04.6 LTS (GNU/Linux 4.4.0-145-generic x86_64), CIFS mount is in /etc/fstab:
//server/path /var/mnt/path/to/mount cifs rw,exec,uid=1000,gid=1000,credentials=/etc/creds_file,vers=1.0 0 0
We use vers=1.0
because we were having issues writing to the mount. Can't remember the details off the top of my head though.
AFAICT looks exactly like https://jira.whamcloud.com/browse/LU-305, interestingly on RH. Points to a bug in libc, and/or something that can be worked around in tar
.
Describe the bug Running
tar -xf /path/to/tarball.gz -C /some/cifs/mount
fails sporadically. I believe this is an occurrence of https://access.redhat.com/solutions/5493691, relatively fresh Which in turn looks like https://jira.whamcloud.com/browse/LU-305 albeit this one is not directly relatedReproduction steps Steps to reproduce the behavior:
tar -xf /path/to/tarball.tar.gz -C /some/cifs/mount
Expected behavior Command succeeds, with content of the archives extracted in the correct folder
Actual behavior Command sporadically fails with:
System details
Ignition config Probably not relevant?
Additional information Not sure there's much we can do here; if RHEL fixes the issue, how long do we expect it take to propagate down to FCOS?