Azure / azure-storage-azcopy

The new Azure Storage data transfer utility - AzCopy v10
MIT License
603 stars 216 forks source link

Incorrect handling of EINTR in syscalls #2489

Open ahoenselaar opened 9 months ago

ahoenselaar commented 9 months ago

Which version of the AzCopy was used? 10.21.1

Note: The version is visible when running AzCopy without any argument

Which platform are you using? Linux

What command did you run?

Note: Please remove the SAS to avoid exposing your credentials. If you cannot remember the exact command, please retrieve it from the beginning of the log file.

cp [REMOTE_DIR] /tmp/tmpdir

What problem was encountered?

ERR: [P#0-T#0] DOWNLOADFAILED: [REMOTE_FILE] : 000 : File Creation Error interrupted system call
   Dst: /tmp/tmpdir/tmpfile

This is likely caused by incorrect handling of EINTR for syscalls, which should be retried, e.g. here.

How can we reproduce the problem in the simplest way?

Have you found a mitigation/solution?

No.

vibhansa-msft commented 9 months ago

Can you share more details on how you are able to create this scenario or why you feel fallocate is getting EINTR?

ahoenselaar commented 9 months ago

We encounter this sporadically in production with the posted error message in the azcopy logs. I am not aware of particular circumstances that increase the frequency of this issue.

Golang has a bit of a history with EINTR (see this thread, for example). I cannot point exactly at fallocate but it has to be caused by one of the functions that report "File Creation Error" and it has to be a function that makes a syscall that can return EINTR. fallocate is one candidate but every single syscall in the azcopy codebase should be reviewed.