datalad / datalad

Keep code, data, containers under control with git and git-annex
http://datalad.org
Other
527 stars 110 forks source link

Upload of large files into S3 bucket fails - S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "EntityTooLarge", s3ErrorMessage = "Your proposed upload exceeds the maximum allowed object size." #5890

Closed m-petersen closed 3 years ago

m-petersen commented 3 years ago

What is the problem?

I aim for establishing a S3 special remote for our local S3 Bucket in a dataset of singularity containers to make that dataset shareable across clients. When datalad tries to upload container images >5gb it fails with

[ERROR ] S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "EntityTooLarge", s3ErrorMessage = "Your proposed upload exceeds the maximum allowed object size.", s3ErrorResource = Nothing, s3ErrorHostId = Just "66YMxRsk-CPKLdiois3r7H3pQ_ewu2bDN_5CwpuTh9k", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}

I contacted the sysadmin for our S3 storage and he said that the Bucket is configured to allow files >5gb. His take on this is that the S3 client used by datalad under the hood won't switch upload "strategies" as for example aws cli would do (PUT for objects <5gb and MPU for >5gb).

The full verbose output is ``` $ datalad -l debug push --to github [DEBUG ] Command line args 1st pass for DataLad 0.14.4. Parsed: Namespace() Unparsed: ['push', '--to', 'github'] [DEBUG ] Discovering plugins [DEBUG ] Building doc for [DEBUG ] Building doc for [DEBUG ] Building doc for [DEBUG ] Parsing known args among ['/work/fatx405/miniconda3/bin/datalad', '-l', 'debug', 'push', '--to', 'github'] [DEBUG ] Async run: | cwd=None | cmd=['git', '--git-dir=', 'config', '-z', '-l', '--show-origin'] [DEBUG ] Launching process ['git', '--git-dir=', 'config', '-z', '-l', '--show-origin'] [DEBUG ] Process 28681 started [DEBUG ] Waiting for process 28681 to complete [DEBUG ] Process 28681 exited with return code 0 [DEBUG ] Determined class of decorated function: [DEBUG ] Resolved dataset for pushing: /work/fatx405/projects/envs [DEBUG ] Async run: | cwd=/work/fatx405/projects/envs | cmd=['git', 'config', '-z', '-l', '--show-origin'] [DEBUG ] Launching process ['git', 'config', '-z', '-l', '--show-origin'] [DEBUG ] Process 28715 started [DEBUG ] Waiting for process 28715 to complete [DEBUG ] Process 28715 exited with return code 0 [DEBUG ] Async run: | cwd=/work/fatx405/projects/envs | cmd=['git', 'config', '-z', '-l', '--show-origin', '--file', '/work/fatx405/projects/envs/.datalad/config'] [DEBUG ] Launching process ['git', 'config', '-z', '-l', '--show-origin', '--file', '/work/fatx405/projects/envs/.datalad/config'] [DEBUG ] Process 28765 started [DEBUG ] Waiting for process 28765 to complete [DEBUG ] Process 28765 exited with return code 0 [DEBUG ] Resolved dataset for difference reporting: /work/fatx405/projects/envs [DEBUG ] Diff Dataset(/work/fatx405/projects/envs) from 'None' to 'HEAD' [DEBUG ] AnnexRepo(/work/fatx405/projects/envs).get_content_info(...) [DEBUG ] Query repo: ['ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l'] [DEBUG ] Async run: | cwd=/work/fatx405/projects/envs | cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l'] [DEBUG ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l'] [DEBUG ] Process 28810 started [DEBUG ] Waiting for process 28810 to complete [DEBUG ] Process 28810 exited with return code 0 [DEBUG ] Done query repo: ['ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l'] [DEBUG ] Done AnnexRepo(/work/fatx405/projects/envs).get_content_info(...) [DEBUG ] Attempt push of Dataset at /work/fatx405/projects/envs [DEBUG ] Discovered publication dependencies for 'github': ['s3']' [DEBUG ] Async run: | cwd=/work/fatx405/projects/envs | cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'push', '--progress', '--porcelain', '--dry-run', 'github'] [DEBUG ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'push', '--progress', '--porcelain', '--dry-run', 'github'] [DEBUG ] Process 28917 started [DEBUG ] Waiting for process 28917 to complete [DEBUG ] Non-progress stderr: b'fatal: The current branch main has no upstream branch.\n' [DEBUG ] Non-progress stderr: b'To push the current branch and set the remote as upstream, use\n' [DEBUG ] Non-progress stderr: b'\n' [DEBUG ] Non-progress stderr: b' git push --set-upstream github main\n' [DEBUG ] Non-progress stderr: b'\n' [DEBUG ] Process 28917 exited with return code 128 [DEBUG ] Dry-run push to check push configuration failed, assume no configuration: CommandError: 'git -c diff.ignoreSubmodules=none push --progress --porcelain --dry-run github' failed with exitcode 128 under /work/fatx405/projects/envs [err: 'fatal: The current branch main has no upstream branch. To push the current branch and set the remote as upstream, use git push --set-upstream github main'] [DEBUG ] Async run: | cwd=/work/fatx405/projects/envs | cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'symbolic-ref', 'HEAD'] [DEBUG ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'symbolic-ref', 'HEAD'] [DEBUG ] Process 28948 started [DEBUG ] Waiting for process 28948 to complete [DEBUG ] Process 28948 exited with return code 0 [DEBUG ] Async run: | cwd=/work/fatx405/projects/envs | cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'symbolic-ref', 'HEAD'] [DEBUG ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'symbolic-ref', 'HEAD'] [DEBUG ] Process 28980 started [DEBUG ] Waiting for process 28980 to complete [DEBUG ] Process 28980 exited with return code 0 [DEBUG ] No sync necessary, no corresponding branch detected [DEBUG ] Async run: | cwd=/work/fatx405/projects/envs | cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'push', '--progress', '--porcelain', '--dry-run', 's3'] [DEBUG ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'push', '--progress', '--porcelain', '--dry-run', 's3'] [DEBUG ] Process 29011 started [DEBUG ] Waiting for process 29011 to complete [DEBUG ] Non-progress stderr: b'fatal: The current branch main has no upstream branch.\n' [DEBUG ] Non-progress stderr: b'To push the current branch and set the remote as upstream, use\n' [DEBUG ] Non-progress stderr: b'\n' [DEBUG ] Non-progress stderr: b' git push --set-upstream s3 main\n' [DEBUG ] Non-progress stderr: b'\n' [DEBUG ] Process 29011 exited with return code 128 [DEBUG ] Dry-run push to check push configuration failed, assume no configuration: CommandError: 'git -c diff.ignoreSubmodules=none push --progress --porcelain --dry-run s3' failed with exitcode 128 under /work/fatx405/projects/envs [err: 'fatal: The current branch main has no upstream branch. To push the current branch and set the remote as upstream, use git push --set-upstream s3 main'] [DEBUG ] Async run: | cwd=/work/fatx405/projects/envs | cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'symbolic-ref', 'HEAD'] [DEBUG ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'symbolic-ref', 'HEAD'] [DEBUG ] Process 29042 started [DEBUG ] Waiting for process 29042 to complete [DEBUG ] Process 29042 exited with return code 0 [DEBUG ] Async run: | cwd=/work/fatx405/projects/envs | cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'symbolic-ref', 'HEAD'] [DEBUG ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'symbolic-ref', 'HEAD'] [DEBUG ] Process 29073 started [DEBUG ] Waiting for process 29073 to complete [DEBUG ] Process 29073 exited with return code 0 [DEBUG ] No sync necessary, no corresponding branch detected [DEBUG ] Launching process ['/work/fatx405/miniconda3/bin/python', '--version'] [DEBUG ] Process 29122 started [DEBUG ] Waiting for process 29122 to complete [DEBUG ] Process 29122 exited with return code 0 [DEBUG ] Async run: | cwd=None | cmd=['git', 'annex', 'version', '--raw'] [DEBUG ] Launching process ['git', 'annex', 'version', '--raw'] [DEBUG ] Process 29123 started [DEBUG ] Waiting for process 29123 to complete [DEBUG ] Process 29123 exited with return code 0 [DEBUG ] Async run: | cwd=/work/fatx405/projects/envs | cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'annex', 'findref', '--copies', '0', 'HEAD', '--json', '--json-error-messages', '-c', 'annex.dotfiles=true', '-c', 'annex.retry=3'] [DEBUG ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'annex', 'findref', '--copies', '0', 'HEAD', '--json', '--json-error-messages', '-c', 'annex.dotfiles=true', '-c', 'annex.retry=3'] [DEBUG ] Process 29173 started [DEBUG ] Waiting for process 29173 to complete [DEBUG ] Process 29173 exited with return code 0 [DEBUG ] Async run: | cwd=/work/fatx405/projects/envs | cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'annex', 'wanted', 's3', '-c', 'annex.dotfiles=true', '-c', 'annex.retry=3'] [DEBUG ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'annex', 'wanted', 's3', '-c', 'annex.dotfiles=true', '-c', 'annex.retry=3'] [DEBUG ] Process 29301 started [DEBUG ] Waiting for process 29301 to complete [DEBUG ] Process 29301 exited with return code 0 [DEBUG ] Push data from Dataset(/work/fatx405/projects/envs) to 's3' [DEBUG ] Counted 68394561599 bytes of annex data to transfer [DEBUG ] Async run: | cwd=/work/fatx405/projects/envs | cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'annex', 'copy', '--batch', '-z', '--to', 's3', '--fast', '--json', '--json-error-messages', '--json-progress', '-c', 'annex.dotfiles=true', '-c', 'annex.retry=3'] [DEBUG ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'annex', 'copy', '--batch', '-z', '--to', 's3', '--fast', '--json', '--json-error-messages', '--json-progress', '-c', 'annex.dotfiles=true', '-c', 'annex.retry=3'] [DEBUG ] Process 29381 started [DEBUG ] Waiting for process 29381 to complete [ERROR ] S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "EntityTooLarge", s3ErrorMessage = "Your proposed upload exceeds the maximum allowed object size.", s3ErrorResource = Nothing, s3ErrorHostId = Just "66YMxRsk-CPKLdiois3r7H3pQ_ewu2bDN_5CwpuTh9k", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing} [ERROR ] S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "EntityTooLarge", s3ErrorMessage = "Your proposed upload exceeds the maximum allowed object size.", s3ErrorResource = Nothing, s3ErrorHostId = Just "66YMxRsk-CPKLdiois3r7H3pQ_ewu2bDN_5CwpuTh9k", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing} [ERROR ] S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "EntityTooLarge", s3ErrorMessage = "Your proposed upload exceeds the maximum allowed object size.", s3ErrorResource = Nothing, s3ErrorHostId = Just "yGlb-8nYfqfH1xNWwHQkj5W4Og0iw6z-17689Q1KhL0", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing} [ERROR ] S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "EntityTooLarge", s3ErrorMessage = "Your proposed upload exceeds the maximum allowed object size.", s3ErrorResource = Nothing, s3ErrorHostId = Just "OJV0IMmVxyJVm_MwzrmIf2tqa5rPLBD5egD0lvHASME", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing} [ERROR ] S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "EntityTooLarge", s3ErrorMessage = "Your proposed upload exceeds the maximum allowed object size.", s3ErrorResource = Nothing, s3ErrorHostId = Just "OJV0IMmVxyJVm_MwzrmIf2tqa5rPLBD5egD0lvHASME", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing} [ERROR ] S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "EntityTooLarge", s3ErrorMessage = "Your proposed upload exceeds the maximum allowed object size.", s3ErrorResource = Nothing, s3ErrorHostId = Just "5ET_5H2r6x4COTTXPqblraROYCWrQpWrS3IuqZAIvzM", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing} [ERROR ] S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "EntityTooLarge", s3ErrorMessage = "Your proposed upload exceeds the maximum allowed object size.", s3ErrorResource = Nothing, s3ErrorHostId = Just "5m2unNOOp9Ki4rbYA-2rAJEgVi4q8tqJn4fUpJLWwq4", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing} [ERROR ] S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "EntityTooLarge", s3ErrorMessage = "Your proposed upload exceeds the maximum allowed object size.", s3ErrorResource = Nothing, s3ErrorHostId = Just "5m2unNOOp9Ki4rbYA-2rAJEgVi4q8tqJn4fUpJLWwq4", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing} [ERROR ] S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "EntityTooLarge", s3ErrorMessage = "Your proposed upload exceeds the maximum allowed object size.", s3ErrorResource = Nothing, s3ErrorHostId = Just "BQqIg0VfQmpKt1EK4pCjjfP7SwSxcQI3JFBUELbjz-8", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing} [ERROR ] S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "EntityTooLarge", s3ErrorMessage = "Your proposed upload exceeds the maximum allowed object size.", s3ErrorResource = Nothing, s3ErrorHostId = Just "TfULcyinRB-nGBaxZYt72-wkDIkXqsa-NKnNPYV1_k0", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing} [ERROR ] S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "EntityTooLarge", s3ErrorMessage = "Your proposed upload exceeds the maximum allowed object size.", s3ErrorResource = Nothing, s3ErrorHostId = Just "TfULcyinRB-nGBaxZYt72-wkDIkXqsa-NKnNPYV1_k0", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing} [ERROR ] S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "EntityTooLarge", s3ErrorMessage = "Your proposed upload exceeds the maximum allowed object size.", s3ErrorResource = Nothing, s3ErrorHostId = Just "4hZn6VrMX1At4bOE2wcIW-8OOGrb5s5iDNR6cj8TZtA", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing} [ERROR ] S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "EntityTooLarge", s3ErrorMessage = "Your proposed upload exceeds the maximum allowed object size.", s3ErrorResource = Nothing, s3ErrorHostId = Just "VJZqEp3y7vUmnMO7wW80i1sKDzV_CGMATkY2vpuI9wA", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing} [ERROR ] S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "EntityTooLarge", s3ErrorMessage = "Your proposed upload exceeds the maximum allowed object size.", s3ErrorResource = Nothing, s3ErrorHostId = Just "VJZqEp3y7vUmnMO7wW80i1sKDzV_CGMATkY2vpuI9wA", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing} [ERROR ] S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "EntityTooLarge", s3ErrorMessage = "Your proposed upload exceeds the maximum allowed object size.", s3ErrorResource = Nothing, s3ErrorHostId = Just "Gt9zeII8PsH5zMeUMprIrEwS9RVYeHudlBTtT7ji-6o", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing} [ERROR ] S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "EntityTooLarge", s3ErrorMessage = "Your proposed upload exceeds the maximum allowed object size.", s3ErrorResource = Nothing, s3ErrorHostId = Just "VJZqEp3y7vUmnMO7wW80i1sKDzV_CGMATkY2vpuI9wA", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing} [ERROR ] S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "EntityTooLarge", s3ErrorMessage = "Your proposed upload exceeds the maximum allowed object size.", s3ErrorResource = Nothing, s3ErrorHostId = Just "Gt9zeII8PsH5zMeUMprIrEwS9RVYeHudlBTtT7ji-6o", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing} [ERROR ] S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "EntityTooLarge", s3ErrorMessage = "Your proposed upload exceeds the maximum allowed object size.", s3ErrorResource = Nothing, s3ErrorHostId = Just "e2aHald-a5r_AR0DMo-lymSIh--334mbVt3L4EDHTJc", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing} [ERROR ] S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "EntityTooLarge", s3ErrorMessage = "Your proposed upload exceeds the maximum allowed object size.", s3ErrorResource = Nothing, s3ErrorHostId = Just "e2aHald-a5r_AR0DMo-lymSIh--334mbVt3L4EDHTJc", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing} [ERROR ] S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "EntityTooLarge", s3ErrorMessage = "Your proposed upload exceeds the maximum allowed object size.", s3ErrorResource = Nothing, s3ErrorHostId = Just "7aVHXU6xtforOxXfdNBVuuIt4NndfeAhzRuw1R5NLB0", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing} [DEBUG ] Process 29381 exited with return code 1 Push to 'github': 25%|▎| 1.00/4.00 [01:23<04:11, 8CommandError: 'git -c diff.ignoreSubmodules=none annex copy --batch -z --to s3 --fast --json --json-error-messages --json-progress -c annex.dotfiles=true -c annex.retry=3' failed with exitcode 1 under /work/fatx405/projects/envs git-annex: copy: 6 failed ```

What steps will reproduce the problem?

I initialized the S3 special remote as descriped in a handbook chapter (https://handbook.datalad.org/en/latest/basics/101-139-s3.html) with a script.

    # Initialize S3 special remote in superds and create LZS S3 bucket as sibling
    echo "Please provide a repository name"
    read repo

    echo "Supply S3 bucket name to create or save to."
    echo "Please adhere to format uke-csi-<individual name>."
    echo "Naming the bucket according to the repository name is recommended. Like uke-csi-<repository name>."
    echo "Add '-test' if you don't want the LZS admins to establish a bucket mirror; e.g. uke-csi-dataset-test."
    read bucket

    echo "Enter AWS access key"
    read aws_access
    export AWS_ACCESS_KEY_ID=$aws_access
    echo $AWS_ACCESS_KEY_ID

    echo "Enter AWS secret access key"
    read aws_secret
    export AWS_SECRET_ACCESS_KEY=$aws_secret
    echo $AWS_SECRET_ACCESS_KEY

    git annex initremote s3 type=S3 datacenter=s3-uhh encryption=none bucket=$bucket public=no autoenable=true host=s3-uhh.lzs.uni-hamburg.de

    git annex enableremote s3 publicurl=https://${bucket}.s3-uhh.lzs.uni-hamburg.de

    datalad create-sibling-github -d . --publish-depends s3 --github-organization csi-hamburg --private -s github $proj

What version of DataLad are you using (run datalad --version)? On what operating system (consider running datalad wtf)?

WTF # WTF ## configuration ## credentials - keyring: - active_backends: - PlaintextKeyring with no encyption v.1.0 at /home/fatx405/.local/share/python_keyring/keyring_pass.cfg - config_file: /home/fatx405/.config/python_keyring/keyringrc.cfg - data_root: /home/fatx405/.local/share/python_keyring ## datalad - full_version: 0.14.4 - version: 0.14.4 ## dependencies - annexremote: 1.5.0 - appdirs: 1.4.4 - boto: 2.49.0 - cmd:7z: 16.02 - cmd:annex: 8.20201104-g13bab4f2c - cmd:bundled-git: 2.29.2 - cmd:git: 2.29.2 - cmd:system-git: 2.29.2 - cmd:system-ssh: 7.4p1 - exifread: 2.1.2 - humanize: 3.2.0 - iso8601: 0.1.14 - keyring: 22.0.1 - keyrings.alt: 4.0.2 - msgpack: 1.0.2 - mutagen: 1.41.1 - requests: 2.25.1 - wrapt: 1.12.1 ## environment - LANG: en_US.UTF-8 - PATH: /work/fatx405/miniconda3/bin:/sw/link/git/2.32.0/bin:/sw/env/system-gcc/singularity/3.5.2-overlayfix/bin:/sw/batch/slurm/19.05.6/bin:/sw/rrz/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin ## extensions - container: - description: Containerized environments - entrypoints: - datalad_container.containers_add.ContainersAdd: - class: ContainersAdd - load_error: None - module: datalad_container.containers_add - names: - containers-add - containers_add - datalad_container.containers_list.ContainersList: - class: ContainersList - load_error: None - module: datalad_container.containers_list - names: - containers-list - containers_list - datalad_container.containers_remove.ContainersRemove: - class: ContainersRemove - load_error: None - module: datalad_container.containers_remove - names: - containers-remove - containers_remove - datalad_container.containers_run.ContainersRun: - class: ContainersRun - load_error: None - module: datalad_container.containers_run - names: - containers-run - containers_run - load_error: None - module: datalad_container - version: 1.1.4 - hirni: - description: HIRNI workflows - entrypoints: - datalad_hirni.commands.dicom2spec.Dicom2Spec: - class: Dicom2Spec - load_error: None - module: datalad_hirni.commands.dicom2spec - names: - hirni-dicom2spec - hirni_dicom2spec - datalad_hirni.commands.import_dicoms.ImportDicoms: - class: ImportDicoms - load_error: None - module: datalad_hirni.commands.import_dicoms - names: - hirni-import-dcm - hirni_import_dcm - datalad_hirni.commands.spec2bids.Spec2Bids: - class: Spec2Bids - load_error: None - module: datalad_hirni.commands.spec2bids - names: - hirni-spec2bids - hirni_spec2bids - datalad_hirni.commands.spec4anything.Spec4Anything: - class: Spec4Anything - load_error: None - module: datalad_hirni.commands.spec4anything - names: - hirni-spec4anything - hirni_spec4anything - load_error: None - module: datalad_hirni - version: 0.0.8 - metalad: - description: DataLad semantic metadata command suite - entrypoints: - datalad_metalad.aggregate.Aggregate: - class: Aggregate - load_error: None - module: datalad_metalad.aggregate - names: - meta-aggregate - meta_aggregate - datalad_metalad.dump.Dump: - class: Dump - load_error: None - module: datalad_metalad.dump - names: - meta-dump - meta_dump - datalad_metalad.extract.Extract: - class: Extract - load_error: None - module: datalad_metalad.extract - names: - meta-extract - meta_extract - load_error: None - module: datalad_metalad - version: 0.2.1 - neuroimaging: - description: Neuroimaging tools - entrypoints: - datalad_neuroimaging.bids2scidata.BIDS2Scidata: - class: BIDS2Scidata - load_error: None - module: datalad_neuroimaging.bids2scidata - names: - bids2scidata - load_error: None - module: datalad_neuroimaging - version: 0.3.1 - ukbiobank: - description: UKBiobank dataset support - entrypoints: - datalad_ukbiobank.init.Init: - class: Init - load_error: None - module: datalad_ukbiobank.init - names: - ukb-init - ukb_init - datalad_ukbiobank.update.Update: - class: Update - load_error: None - module: datalad_ukbiobank.update - names: - ukb-update - ukb_update - load_error: None - module: datalad_ukbiobank - version: 0.3.2 - webapp: - description: Generic web app support - entrypoints: - datalad_webapp.WebApp: - class: WebApp - load_error: None - module: datalad_webapp - names: - webapp - webapp - load_error: None - module: datalad_webapp - version: 0.3 ## git-annex - build flags: - Assistant - Webapp - Pairing - Inotify - DBus - DesktopNotify - TorrentParser - MagicMime - Feeds - Testsuite - S3 - WebDAV - dependency versions: - aws-0.22 - bloomfilter-2.0.1.0 - cryptonite-0.26 - DAV-1.3.4 - feed-1.3.0.1 - ghc-8.8.4 - http-client-0.6.4.1 - persistent-sqlite-2.10.6.2 - torrent-10000.1.1 - uuid-1.3.13 - yesod-1.6.1.0 - key/value backends: - SHA256E - SHA256 - SHA512E - SHA512 - SHA224E - SHA224 - SHA384E - SHA384 - SHA3_256E - SHA3_256 - SHA3_512E - SHA3_512 - SHA3_224E - SHA3_224 - SHA3_384E - SHA3_384 - SKEIN256E - SKEIN256 - SKEIN512E - SKEIN512 - BLAKE2B256E - BLAKE2B256 - BLAKE2B512E - BLAKE2B512 - BLAKE2B160E - BLAKE2B160 - BLAKE2B224E - BLAKE2B224 - BLAKE2B384E - BLAKE2B384 - BLAKE2BP512E - BLAKE2BP512 - BLAKE2S256E - BLAKE2S256 - BLAKE2S160E - BLAKE2S160 - BLAKE2S224E - BLAKE2S224 - BLAKE2SP256E - BLAKE2SP256 - BLAKE2SP224E - BLAKE2SP224 - SHA1E - SHA1 - MD5E - MD5 - WORM - URL - X* - operating system: linux x86_64 - remote types: - git - gcrypt - p2p - S3 - bup - directory - rsync - web - bittorrent - webdav - adb - tahoe - glacier - ddar - git-lfs - httpalso - hook - external - supported repository versions: - 8 - upgrade supported from repository versions: - 0 - 1 - 2 - 3 - 4 - 5 - 6 - 7 - version: 8.20201104-g13bab4f2c ## location - path: /home/fatx405 - type: directory ## metadata_extractors - annex (datalad 0.14.4): - distribution: datalad 0.14.4 - load_error: None - module: datalad.metadata.extractors.annex - version: None - audio (datalad 0.14.4): - distribution: datalad 0.14.4 - load_error: None - module: datalad.metadata.extractors.audio - version: None - bids (datalad-neuroimaging 0.3.1): - distribution: datalad-neuroimaging 0.3.1 - load_error: None - module: datalad_neuroimaging.extractors.bids - version: None - datacite (datalad 0.14.4): - distribution: datalad 0.14.4 - load_error: None - module: datalad.metadata.extractors.datacite - version: None - datalad_core (datalad 0.14.4): - distribution: datalad 0.14.4 - load_error: None - module: datalad.metadata.extractors.datalad_core - version: None - datalad_rfc822 (datalad 0.14.4): - distribution: datalad 0.14.4 - load_error: None - module: datalad.metadata.extractors.datalad_rfc822 - version: None - dicom (datalad-neuroimaging 0.3.1): - distribution: datalad-neuroimaging 0.3.1 - load_error: None - module: datalad_neuroimaging.extractors.dicom - version: None - exif (datalad 0.14.4): - distribution: datalad 0.14.4 - load_error: None - module: datalad.metadata.extractors.exif - version: None - frictionless_datapackage (datalad 0.14.4): - distribution: datalad 0.14.4 - load_error: None - module: datalad.metadata.extractors.frictionless_datapackage - version: None - image (datalad 0.14.4): - distribution: datalad 0.14.4 - load_error: None - module: datalad.metadata.extractors.image - version: None - metalad_annex (datalad-metalad 0.2.1): - distribution: datalad-metalad 0.2.1 - load_error: None - module: datalad_metalad.extractors.annex - version: None - metalad_core (datalad-metalad 0.2.1): - distribution: datalad-metalad 0.2.1 - load_error: None - module: datalad_metalad.extractors.core - version: None - metalad_custom (datalad-metalad 0.2.1): - distribution: datalad-metalad 0.2.1 - load_error: None - module: datalad_metalad.extractors.custom - version: None - metalad_runprov (datalad-metalad 0.2.1): - distribution: datalad-metalad 0.2.1 - load_error: None - module: datalad_metalad.extractors.runprov - version: None - nidm (datalad-neuroimaging 0.3.1): - distribution: datalad-neuroimaging 0.3.1 - load_error: None - module: datalad_neuroimaging.extractors.nidm - version: None - nifti1 (datalad-neuroimaging 0.3.1): - distribution: datalad-neuroimaging 0.3.1 - load_error: None - module: datalad_neuroimaging.extractors.nifti1 - version: None - xmp (datalad 0.14.4): - distribution: datalad 0.14.4 - load_error: Exempi library not found. [exempi.py:_load_exempi:60] - module: datalad.metadata.extractors.xmp ## metadata_indexers ## python - implementation: CPython - version: 3.7.9 ## system - distribution: CentOS Linux/7.9.2009/Core - encoding: - default: utf-8 - filesystem: utf-8 - locale.prefered: UTF-8 - max_path_length: 269 - name: Linux - release: 4.14.240-1.0.33.el7.rrz.x86_64 - type: posix - version: #1 SMP Thu Jul 22 18:29:43 CEST 2021

As always grateful for any input!

Cheers, Marvin

yarikoptic commented 3 years ago

thanks for including WTF. so it is happening at git-annex level...

<details>
<summary></summary> 

</details>

edit: FWIW I use chunk=1GB for backing up to dropbox via rclone, with many files way over 1GB limit -- never had an issue.

yarikoptic commented 3 years ago

@m-petersen could you please confirm that you cannot upload files over 5GB to s3 remote without chunking with a fresh git-annex?

m-petersen commented 3 years ago

Sorry for the delay.

Just tested your instructions and using chunk=1GB when initiating the special remote resolved the issue.

I also appreciate your recommendation considering the encryption.