Open mikkonie opened 4 years ago
I would suggest something along the following spec.
What is the addressed problem? We need to assure that the user's uploaded files are (a) correct, i.e., the uploaded content is identical with the source content, and (b) complete, i.e., there is no missing or extra uploaded file.
What is the current status?
We allow for uploading $file.md5
files together with the actual $file
. This addresses problem (a) and is very useful for incremental file uploads.
However, we do not address (b) yet.
What does your proposed solution look like?
Optionally, users should be able to upload a SODAR_MANIFEST
file. This file would contain the MD5 sums and full paths. In this case, all .md5
files become optional. When they are included, their content will be compared to the corresponding file's content anyway.
When transferring the landing zone with a manifest file, the files in the landing zone will be compared to the files in the manifest and the set of both files have to be equal. Further, the corresponding MD5 checksums in the manifest file are compared to the MD5 checksums in iRODS. If everything works out, the files are transferred and the $file.md5
files are created automatically based on the checksums in the manifest.
The manifest file itself is not included in the target iRODS storage.
Example for creating a manifest file.
# mkdir -p subdir
# head -c 100 /dev/random >subdir/foo
# head -c 100 /dev/random >subdir/bar
# for i in $(find . -type f | sort); do md5sum $i >>SODAR_MANIFEST; done
# cat SODAR_MANIFEST
d307d8a9635109d760c9e1dc90a54bad ./subdir/bar
6a139e890ddcc8c2ae235c71ca772881 ./subdir/foo
From the original issue by @holtgrewe:
Further, it would be useful to have support for "MANIFEST.txt" files that lists all files to be expected in the landing zone. This would allow for upload tools to append to this file first and then upload the files. Missing file+.md5 file uploads could thus be detected.