Open thompsonmj opened 9 months ago
Update: I downloaded the same file once more (this time using aria2 rather than wget
, which shouldn't make a difference besides speed), and the archive seems to be in good shape.
7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,40 CPUs Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (50654),ASM,AES-NI)
Scanning the drive for archives:
1 file, 938529016008 bytes (875 GiB)
Testing archive: /fs/scratch/PAS2136/gbif/data//2024-02-01/0003602-240130105604617.zip
--
Path = /fs/scratch/PAS2136/gbif/data//2024-02-01/0003602-240130105604617.zip
Type = zip
Physical Size = 938529016008
64-bit = +
Everything is Ok
Files: 61871
Size: 4846181236102
Compressed: 938529016008
ZIP file integrity test result: 0
FWIW, here is the MD5 for the second download attempt:
$ cat 0003602-240130105604617.zip_checksum.txt
16d5db9526b807050b799917c9336eaf
It would be helpful for data integrity verification after downloading a dataset from a monthly snapshot to have a checksum provided by the server to compare the downloaded ZIP file to.
We are currently looking at this, for instance: https://doi.org/10.15468/dl.xw682s
Additional context on the downloaded data: