dandi / dandi-cli

DANDI command line client to facilitate common operations
https://dandi.readthedocs.io/
Apache License 2.0
22 stars 27 forks source link

zarr re-upload might still be suffering from "A header you provided implies functionality that is not implemented" #1408

Closed yarikoptic closed 5 months ago

yarikoptic commented 9 months ago

I was trying to produce 2nd version of a sample zarr under https://dandiarchive.org/dandiset/000029/draft/files?location=sub-randomzarrlike&page=1 https://dandi.centerforopenneuroscience.org/zarrs/fd6/ab3/fd6ab3ea-cff6-4006-a9bf-acfa5d983985/ which is also freshly created https://github.com/dandizarrs/fd6ab3ea-cff6-4006-a9bf-acfa5d983985/ .

Was created using python3 mktree.py /home/yoh/proj/dandi/dandisets/000029/sub-randomzarrlike/sub-randomzarrlike_junk.zarr layouts/zarr128-smallfiles.json and now "modified" using python3 mktree.py /home/yoh/proj/dandi/dandisets/000029/sub-randomzarrlike/sub-randomzarrlike_junk.zarr layouts/zarr64-smallfiles.json (after rm -rf prior existing subfolders) so that checksum went from af1b4b5849cd2b79a29762d716863379-37480--38379520 to 422bd4b0fa7a06d2655d3c1f67fd4a6f-263522--269846528.

During re-upload screen was showing

❯ dandi upload --validation ignore -J 5:200 ./sub-randomzarrlike_junk.zarr
2024-02-16 15:16:27,754 [    INFO] Found 2 files to consider
PATH                                            SIZE       ERRORS     UPLOAD STATUS                MESSAGE                  
dandiset.yaml                                   3.0 kB                       skipped               should be edited online  
sub-randomzarrlike/sub-randomzarrlike_junk.zarr               1              producing asset       exists - reuploading     

and that "producing asset" made me worry since AFAIK we should have just modified existing zarr, not generate a new one... but it talks about "asset", not "zarr", so hope remained...

Then dandi CLI was showing no progress or anything with 100% CPU in comparing against remote Zarr exists - reuploading state for a few minutes but then upload commenced. I have used -J 5:200 which resulted in dandi CLI somehow taking over 400% CPU according to top... I decided to look into the log file (on my laptop) 20240216201625Z-3182147.log and saw those errors search for which brought me to

2024-02-16T15:16:25-0500 [INFO    ] dandi 3182147:140109695737920 dandi v0.59.1+2.g7ca670b9, hdmf v3.12.0, pynwb v2.5.0, h5py v3.10.0
...
2024-02-16T15:37:17-0500 [WARNING ] dandi 3182147:140099501135552 Will retry: Error 501 while sending PUT request to https://dandiarchive.s3.amazonaws.com/zarr/fd6ab3ea-cff6-4006-a9bf-acfa5d983985/0/0/d0/d2/d26/f1.dat?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=SENSORED240216%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20240216T203627Z&X-Amz-Expires=600&X-Amz-SignedHeaders=content-md5%3Bhost%3Bx-amz-acl&X-Amz-Signature=SENSOREDabc18760fa95f5a1d68d81b26cde2092acc8d49: <?xml version="1.0" encoding="UTF-8"?>
<Error><Code>NotImplemented</Code><Message>A header you provided implies functionality that is not implemented</Message><Header>Transfer-Encoding</Header><RequestId>ZJ4G67HNV35TE36A</RequestId><HostId>epCo3E+XFJuxZoX15VXH9bHprpU36w8U9GHzrNePQWtJ3CuOq+B9xABYDDfizeojW1+KSztpXU4=</HostId></Error>
2024-02-16T15:37:29-0500 [DEBUG   ] urllib3.connectionpool 3182147:140099501135552 Resetting dropped connection: dandiarchive.s3.amazonaws.com
2024-02-16T15:37:29-0500 [DEBUG   ] urllib3.connectionpool 3182147:140099501135552 https://dandiarchive.s3.amazonaws.com:443 "PUT /zarr/fd6ab3ea-cff6-4006-a9bf-acfa5d983985/0/0/d0/d2/d26/f1.dat?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=SENSORED240216%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20240216T203627Z&X-Amz-Expires=600&X-Amz-SignedHeaders=content-md5%3Bhost%3Bx-amz-acl&X-Amz-Signature=SENSOREDabc18760fa95f5a1d68d81b26cde2092acc8d49 HTTP/1.1" 501 None
2024-02-16T15:37:29-0500 [DEBUG   ] charset_normalizer 3182147:140099501135552 Encoding detection: utf_8 is most likely the one.
2024-02-16T15:37:29-0500 [DEBUG   ] charset_normalizer 3182147:140099501135552 Encoding detection: utf_8 is most likely the one.
2024-02-16T15:37:29-0500 [ERROR   ] dandi 3182147:140099501135552 HTTP request failed repeatedly: Error 501 while sending PUT request to https://dandiarchive.s3.amazonaws.com/zarr/fd6ab3ea-cff6-4006-a9bf-acfa5d983985/0/0/d0/d2/d26/f1.dat?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=SENSORED240216%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20240216T203627Z&X-Amz-Expires=600&X-Amz-SignedHeaders=content-md5%3Bhost%3Bx-amz-acl&X-Amz-Signature=SENSOREDabc18760fa95f5a1d68d81b26cde2092acc8d49: <?xml version="1.0" encoding="UTF-8"?>
<Error><Code>NotImplemented</Code><Message>A header you provided implies functionality that is not implemented</Message><Header>Transfer-Encoding</Header><RequestId>TMYXYY9Z0KK9MPP8</RequestId><HostId>TE19VW1d0HD+zXm8jUdRWkcpfi0UnUkOK8ml04mZGUo+t375JtA3ZzcX2xFdDLZWiuFDDxRPSME=</HostId></Error>
2024-02-16T15:37:29-0500 [DEBUG   ] dandi 3182147:140107868591808 Error uploading zarr: HTTPError: 501 Server Error: Not Implemented for url: https://dandiarchive.s3.amazonaws.com/zarr/fd6ab3ea-cff6-4006-a9bf-acfa5d983985/0/0/d0/d2/d26/f1.dat?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=SENSORED240216%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20240216T203627Z&X-Amz-Expires=600&X-Amz-SignedHeaders=content-md5%3Bhost%3Bx-amz-acl&X-Amz-Signature=SENSOREDabc18760fa95f5a1d68d81b26cde2092acc8d49
2024-02-16T15:37:29-0500 [DEBUG   ] dandi 3182147:140107868591808 Cancelling upload
2024-02-16T15:37:29-0500 [ERROR   ] dandi 3182147:140107868591808 Error uploading /home/yoh/proj/dandi/dandisets/000029/sub-randomzarrlike/sub-randomzarrlike_junk.zarr:
Traceback (most recent call last):
  File "/home/yoh/proj/dandi/dandi-cli-master/dandi/upload.py", line 357, in process_path
    for r in dfile.iter_upload(
  File "/home/yoh/proj/dandi/dandi-cli-master/dandi/files/zarr.py", line 486, in iter_upload
    size = fut.result()
           ^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yoh/proj/dandi/dandi-cli-master/dandi/files/zarr.py", line 561, in _upload_zarr_file
    storage_session.put(
  File "/home/yoh/proj/dandi/dandi-cli-master/dandi/dandiapi.py", line 315, in put
    return self.request("PUT", path, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yoh/proj/dandi/dandi-cli-master/dandi/dandiapi.py", line 199, in request
    for i, attempt in enumerate(
  File "/home/yoh/proj/dandi/dandi-cli-master/venvs/dev3.11/lib/python3.11/site-packages/tenacity/__init__.py", line 347, in __iter__
    do = self.iter(retry_state=retry_state)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yoh/proj/dandi/dandi-cli-master/venvs/dev3.11/lib/python3.11/site-packages/tenacity/__init__.py", line 325, in iter
    raise retry_exc.reraise()
          ^^^^^^^^^^^^^^^^^^^
  File "/home/yoh/proj/dandi/dandi-cli-master/venvs/dev3.11/lib/python3.11/site-packages/tenacity/__init__.py", line 158, in reraise
    raise self.last_attempt.result()
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/home/yoh/proj/dandi/dandi-cli-master/dandi/dandiapi.py", line 235, in request
    result.raise_for_status()
  File "/home/yoh/proj/dandi/dandi-cli-master/venvs/dev3.11/lib/python3.11/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 501 Server Error: Not Implemented for url: https://dandiarchive.s3.amazonaws.com/zarr/fd6ab3ea-cff6-4006-a9bf-acfa5d983985/0/0/d0/d2/d26/f1.dat?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=SENSORED240216%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20240216T203627Z&X-Amz-Expires=600&X-Amz-SignedHeaders=content-md5%3Bhost%3Bx-amz-acl&X-Amz-Signature=SENSOREDabc18760fa95f5a1d68d81b26cde2092acc8d49

upload did after all fail with

❯ dandi upload --validation ignore -J 5:200 ./sub-randomzarrlike_junk.zarr
2024-02-16 15:16:27,754 [    INFO] Found 2 files to consider
PATH                                            SIZE        ERRORS          UPLOAD STATUS                          MESSAGE                                                                                                                                         
dandiset.yaml                                   3.0 kB                             skipped                         should be edited online                                                                                                                         
sub-randomzarrlike/sub-randomzarrlike_junk.zarr 269.8 MB       1               19% ERROR                           501 Server Error: Not Implemented for url: https://dandiarchive.s3.amazonaws.com/zarr/fd6ab3ea-cff6-4006-a9bf-acfa5d983985/0/0/d0/d2/d26/f1.d...
Summary:                                        269.8 MB 1 with errors   42.0 kB/s 1 skipped                       1 should be edited online                                                                                                                       
                                                                                   1 ERROR                         1 501 Server Error: Not Implemented for url: https://dandiarchive.s3.amazonaws.com/zarr/fd6ab3ea-cff6-4006-a9bf-acfa5d983985/0/0/d0/d2/d26/f1...
2024-02-16 15:37:30,390 [    INFO] Logs saved in /home/yoh/.local/state/dandi-cli/log/20240216201625Z-3182147.log
Error: 501 Server Error: Not Implemented for url: https://dandiarchive.s3.amazonaws.com/zarr/fd6ab3ea-cff6-4006-a9bf-acfa5d983985/0/0/d0/d2/d26/f1.dat?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAUBRWC5GAEKH3223E%2F20240216%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20240216T203627Z&X-Amz-Expires=600&X-Amz-SignedHeaders=content-md5%3Bhost%3Bx-amz-acl&X-Amz-Signature=249ff2f8f4a98480a09b11160abc18760fa95f5a1d68d81b26cde2092acc8d49
jwodder commented 9 months ago

@yarikoptic Do the logs contain any messages about super_len() or stat()?

yarikoptic commented 9 months ago

I don't think so:

❯ zgrep -e super_len  -e stat 20240216201625Z-3182147.log.gz
    do = self.iter(retry_state=retry_state)
    result.raise_for_status()
  File "/home/yoh/proj/dandi/dandi-cli-master/venvs/dev3.11/lib/python3.11/site-packages/requests/models.py", line 1021, in raise_for_status
2024-02-16T15:37:30-0500 [INFO    ] dandi 3182147:140109695737920 Logs saved in /home/yoh/.local/state/dandi-cli/log/20240216201625Z-3182147.log

full log (compressed) now available from dandi@drogon:20240216201625Z-3182147.log.gz

jwodder commented 9 months ago

@yarikoptic The errors all seem to be associated with the file 0/0/d0/d2/d26/f1.dat. Is that file empty or somehow odd in any way?

jwodder commented 9 months ago

@yarikoptic Can you try redoing the upload with DANDI_DEVEL_INSTRUMENT_REQUESTS_SUPERLEN=1 set?

yarikoptic commented 9 months ago
looks "regular" to me... ```shell ❯ ls sub-randomzarrlike_junk.zarr/0/0/d0/d2/d26/f1.dat -l -rw------- 1 yoh yoh 1024 Feb 16 15:14 sub-randomzarrlike_junk.zarr/0/0/d0/d2/d26/f1.dat ❯ md5sum -l md5sum: invalid option -- 'l' Try 'md5sum --help' for more information. ❯ md5sum sub-randomzarrlike_junk.zarr/0/0/d0/d2/d26/f1.dat 4bd9951479a6a7b2b36b6c8b80f1e5f5 sub-randomzarrlike_junk.zarr/0/0/d0/d2/d26/f1.dat ❯ cat sub-randomzarrlike_junk.zarr/0/0/d0/d2/d26/f1.dat �� �Կ�[�\������#;r�1�>��G/1{���J�/�ճ]��,L-z����%AT�J��驇c�S��%� �7f�����,����$�,�<@��⍩&�P��ߑ�����... ```

seems to be a new file since there is no d26 under https://github.com/dandizarrs/fd6ab3ea-cff6-4006-a9bf-acfa5d983985/tree/0.231017.2004%2Bzarr1/0/0/d0 which should reflect prior version .

jwodder commented 9 months ago

@yarikoptic What if you set the envvar DANDI_DEVEL_INSTRUMENT_REQUESTS_SUPERLEN=1 while uploading?

yarikoptic commented 9 months ago

that re-upload succeeded without error

❯ DANDI_DEVEL_INSTRUMENT_REQUESTS_SUPERLEN=1 dandi upload --validation ignore -J 5:200 ./sub-randomzarrlike_junk.zarr
2024-02-16 17:58:38,651 [    INFO] Found 2 files to consider
PATH                                            SIZE        ERRORS          UPLOAD STATUS                          MESSAGE                  
dandiset.yaml                                   3.0 kB                             skipped                         should be edited online  
sub-randomzarrlike/sub-randomzarrlike_junk.zarr 269.8 MB       1              100% done                            exists - reuploading     
Summary:                                        269.8 MB 1 with errors   41.5 kB/s 1 skipped                       1 should be edited online
                                                                                   1 done                          1 exists - reuploading   
2024-02-16 19:25:44,541 [    INFO] Logs saved in /home/yoh/.local/state/dandi-cli/log/20240216225836Z-3262469.log
DANDI_DEVEL_INSTRUMENT_REQUESTS_SUPERLEN=1 dandi upload --validation ignore -  10097.05s user 294.08s system 198% cpu 1:27:08.90 total

log is at dandi@drogon.dartmouth.edu:20240216225836Z-3262469.log.gz .

yarikoptic commented 8 months ago

uploaded also 20240223192140Z-306046.log.gz from another successful upload. I now wonder if that instrumentation is what makes it pass. grep for anything diagnostic due to super_len seems to be not there:

grep -e 'super_len() report size' -e '- ' 20240216201625Z-3182147.log.gz 20240216225836Z-3262469.log.gz 20240223192140Z-306046.log.gz
yarikoptic commented 7 months ago

@jwodder - please try (script) a few uploads like that. May be against staging instance. Original description provides all the commands needed. Would be nice to catch this and figure out why AWS wasn't happy to not make users surprised later on.

jwodder commented 7 months ago

@yarikoptic As I have stated previous times this has come up, the "A header you provided implies functionality that is not implemented" error from AWS occurs when an upload is made using "chunked" transfer-encoding, and requests uses "chunked" whenever (a) the data being uploaded is zero-length or (b) a file is being uploaded and requests fails to determine its size. What exactly do you want me to do about this?

yarikoptic commented 7 months ago

Figure out either it is a. or b. or some other c. and in the end/accordingly make sure that we do not error out when we upload.

jwodder commented 6 months ago

@yarikoptic I'm currently uploading a randomly-generated Zarr to staging from my MacBook Pro with DANDI_DEVEL_INSTRUMENT_REQUESTS_SUPERLEN=1 set. No errors have occurred yet (aside from a "too many open files" error at the start which was apparently fixed by lowering the number of upload threads per asset). However, this is taking a long time, so I'm now running the following on my account on smaug:

meta-run.sh:

#!/bin/bash
set -x

cd "$(dirname "$0")"
mkdir -p logs
for i in {1..10}
do
    DANDI_DEVEL_INSTRUMENT_REQUESTS_SUPERLEN=1 \
        ./run.sh |& tee logs/"$(date -u +%Y.%m.%d.%H.%M.%SZ)"-spy.log
    ./run.sh |& tee logs/"$(date -u +%Y.%m.%d.%H.%M.%SZ)"-nospy.log
done

run.sh:

#!/bin/bash
export DANDI_API_KEY=---REDACTED---
export DANDI_DEVEL=1

set -ex
cd "$(dirname "$0")"
. venv/bin/activate

cd 214256
for layout in zarr128-smallfiles zarr64-smallfiles
do
    rm -rf random.zarr
    echo Generating random.zarr from $layout ...
    chronic python3 ../zarr-digest-timings/mktree.py \
        random.zarr \
        ../zarr-digest-timings/layouts/$layout.json
    echo "Now: $(date)"
    if ! dandi upload --devel-debug -i dandi-staging --validation ignore -J 5:200 random.zarr
    then echo "Failed at: $(date)"
         exit 1
    fi
    echo "Now: $(date)"
done
jwodder commented 6 months ago

Update: I just got the following errors on smaug:

According to df -T, the /home directory on smaug is btrfs.

EDIT: And now I've gotten "Error: requests.utils.super_len() reported size of 0 for '/Users/jwodder/dartmouth/tmp/dandi-1408/214255/random.zarr/1/0/d0/d1/d9/f34.dat', but os.stat() reported size 1024 bytes 1 tries later" on macOS (filesystem type: apfs).

yarikoptic commented 5 months ago

"fun!" .

jwodder commented 5 months ago

@yarikoptic I'm going to try editing the source code of super_len() to add logging statements to see exactly what's going on, and then I'll likely (hopefully) end up filing an issue with requests.

jwodder commented 5 months ago

@yarikoptic See #1444.

github-actions[bot] commented 5 months ago

:rocket: Issue was released in 0.62.1 :rocket: