We've recently found a few local files downloaded from S3 using s4cmd get --sync-check that were corrupt. Retrying the same download using a separate s4cmd invocation resolved the problem (and we have seen the problem on two completely separate, but similarly configured EC2 instances). We were using version 2.0.1.
Since this command already leverages the MD5 hash saved in the S3 metadata (even, apparently, for multi-part S3 objects) it's amazing that the MD5 is not automatically validated against the local copy after the download completes. Although computing the MD5 on even a large local file is fairly quick (given a reasonably powerful system), you could always provide an option to skip such a check in the interest of performance. Ideally, a failed check would be logged and then the download retried (at least --retry times).
We've recently found a few local files downloaded from S3 using
s4cmd get --sync-check
that were corrupt. Retrying the same download using a separates4cmd
invocation resolved the problem (and we have seen the problem on two completely separate, but similarly configured EC2 instances). We were using version 2.0.1.Since this command already leverages the MD5 hash saved in the S3 metadata (even, apparently, for multi-part S3 objects) it's amazing that the MD5 is not automatically validated against the local copy after the download completes. Although computing the MD5 on even a large local file is fairly quick (given a reasonably powerful system), you could always provide an option to skip such a check in the interest of performance. Ideally, a failed check would be logged and then the download retried (at least
--retry
times).