Closed yarikoptic closed 5 months ago
@yarikoptic Do the logs contain any messages about super_len()
or stat()
?
I don't think so:
❯ zgrep -e super_len -e stat 20240216201625Z-3182147.log.gz
do = self.iter(retry_state=retry_state)
result.raise_for_status()
File "/home/yoh/proj/dandi/dandi-cli-master/venvs/dev3.11/lib/python3.11/site-packages/requests/models.py", line 1021, in raise_for_status
2024-02-16T15:37:30-0500 [INFO ] dandi 3182147:140109695737920 Logs saved in /home/yoh/.local/state/dandi-cli/log/20240216201625Z-3182147.log
full log (compressed) now available from dandi@drogon:20240216201625Z-3182147.log.gz
@yarikoptic The errors all seem to be associated with the file 0/0/d0/d2/d26/f1.dat
. Is that file empty or somehow odd in any way?
@yarikoptic Can you try redoing the upload with DANDI_DEVEL_INSTRUMENT_REQUESTS_SUPERLEN=1
set?
seems to be a new file since there is no d26 under https://github.com/dandizarrs/fd6ab3ea-cff6-4006-a9bf-acfa5d983985/tree/0.231017.2004%2Bzarr1/0/0/d0 which should reflect prior version .
@yarikoptic What if you set the envvar DANDI_DEVEL_INSTRUMENT_REQUESTS_SUPERLEN=1
while uploading?
that re-upload succeeded without error
❯ DANDI_DEVEL_INSTRUMENT_REQUESTS_SUPERLEN=1 dandi upload --validation ignore -J 5:200 ./sub-randomzarrlike_junk.zarr
2024-02-16 17:58:38,651 [ INFO] Found 2 files to consider
PATH SIZE ERRORS UPLOAD STATUS MESSAGE
dandiset.yaml 3.0 kB skipped should be edited online
sub-randomzarrlike/sub-randomzarrlike_junk.zarr 269.8 MB 1 100% done exists - reuploading
Summary: 269.8 MB 1 with errors 41.5 kB/s 1 skipped 1 should be edited online
1 done 1 exists - reuploading
2024-02-16 19:25:44,541 [ INFO] Logs saved in /home/yoh/.local/state/dandi-cli/log/20240216225836Z-3262469.log
DANDI_DEVEL_INSTRUMENT_REQUESTS_SUPERLEN=1 dandi upload --validation ignore - 10097.05s user 294.08s system 198% cpu 1:27:08.90 total
log is at dandi@drogon.dartmouth.edu:20240216225836Z-3262469.log.gz
.
uploaded also 20240223192140Z-306046.log.gz from another successful upload. I now wonder if that instrumentation is what makes it pass. grep
for anything diagnostic due to super_len
seems to be not there:
grep -e 'super_len() report size' -e '- ' 20240216201625Z-3182147.log.gz 20240216225836Z-3262469.log.gz 20240223192140Z-306046.log.gz
@jwodder - please try (script) a few uploads like that. May be against staging instance. Original description provides all the commands needed. Would be nice to catch this and figure out why AWS wasn't happy to not make users surprised later on.
@yarikoptic As I have stated previous times this has come up, the "A header you provided implies functionality that is not implemented" error from AWS occurs when an upload is made using "chunked" transfer-encoding, and requests
uses "chunked" whenever (a) the data being uploaded is zero-length or (b) a file is being uploaded and requests
fails to determine its size. What exactly do you want me to do about this?
Figure out either it is a. or b. or some other c. and in the end/accordingly make sure that we do not error out when we upload
.
@yarikoptic I'm currently uploading a randomly-generated Zarr to staging from my MacBook Pro with DANDI_DEVEL_INSTRUMENT_REQUESTS_SUPERLEN=1
set. No errors have occurred yet (aside from a "too many open files" error at the start which was apparently fixed by lowering the number of upload threads per asset). However, this is taking a long time, so I'm now running the following on my account on smaug:
meta-run.sh
:
#!/bin/bash
set -x
cd "$(dirname "$0")"
mkdir -p logs
for i in {1..10}
do
DANDI_DEVEL_INSTRUMENT_REQUESTS_SUPERLEN=1 \
./run.sh |& tee logs/"$(date -u +%Y.%m.%d.%H.%M.%SZ)"-spy.log
./run.sh |& tee logs/"$(date -u +%Y.%m.%d.%H.%M.%SZ)"-nospy.log
done
run.sh
:
#!/bin/bash
export DANDI_API_KEY=---REDACTED---
export DANDI_DEVEL=1
set -ex
cd "$(dirname "$0")"
. venv/bin/activate
cd 214256
for layout in zarr128-smallfiles zarr64-smallfiles
do
rm -rf random.zarr
echo Generating random.zarr from $layout ...
chronic python3 ../zarr-digest-timings/mktree.py \
random.zarr \
../zarr-digest-timings/layouts/$layout.json
echo "Now: $(date)"
if ! dandi upload --devel-debug -i dandi-staging --validation ignore -J 5:200 random.zarr
then echo "Failed at: $(date)"
exit 1
fi
echo "Now: $(date)"
done
Update: I just got the following errors on smaug:
In a run with DANDI_DEVEL_INSTRUMENT_REQUESTS_SUPERLEN=1
set:
Error uploading zarr: RuntimeError: requests.utils.super_len() reported size of 0 for '/home/jwodder/dandi-1408/214256/random.zarr/0/0/d0/d10/d7/f3.dat', but os.stat() reported size 1024 bytes 1 tries later
In a run without the envvar set, trying to upload a certain Zarr entry failed with the "header implies functionality not implemented" message.
According to df -T
, the /home
directory on smaug is btrfs.
EDIT: And now I've gotten "Error: requests.utils.super_len() reported size of 0 for '/Users/jwodder/dartmouth/tmp/dandi-1408/214255/random.zarr/1/0/d0/d1/d9/f34.dat', but os.stat() reported size 1024 bytes 1 tries later" on macOS (filesystem type: apfs).
"fun!" .
super_len
value of 0@yarikoptic I'm going to try editing the source code of super_len()
to add logging statements to see exactly what's going on, and then I'll likely (hopefully) end up filing an issue with requests
.
@yarikoptic See #1444.
:rocket: Issue was released in 0.62.1
:rocket:
I was trying to produce 2nd version of a sample zarr under https://dandiarchive.org/dandiset/000029/draft/files?location=sub-randomzarrlike&page=1 https://dandi.centerforopenneuroscience.org/zarrs/fd6/ab3/fd6ab3ea-cff6-4006-a9bf-acfa5d983985/ which is also freshly created https://github.com/dandizarrs/fd6ab3ea-cff6-4006-a9bf-acfa5d983985/ .
Was created using
python3 mktree.py /home/yoh/proj/dandi/dandisets/000029/sub-randomzarrlike/sub-randomzarrlike_junk.zarr layouts/zarr128-smallfiles.json
and now "modified" usingpython3 mktree.py /home/yoh/proj/dandi/dandisets/000029/sub-randomzarrlike/sub-randomzarrlike_junk.zarr layouts/zarr64-smallfiles.json
(afterrm -rf
prior existing subfolders) so that checksum went fromaf1b4b5849cd2b79a29762d716863379-37480--38379520
to422bd4b0fa7a06d2655d3c1f67fd4a6f-263522--269846528
.During re-upload screen was showing
and that "producing asset" made me worry since AFAIK we should have just modified existing zarr, not generate a new one... but it talks about "asset", not "zarr", so hope remained...
Then dandi CLI was showing no progress or anything with 100% CPU in comparing against remote Zarr exists - reuploading state for a few minutes but then upload commenced. I have used
-J 5:200
which resulted in dandi CLI somehow taking over 400% CPU according to top... I decided to look into the log file (on my laptop)20240216201625Z-3182147.log
and saw those errors search for which brought me to1048
but it seems that we might be getting them also not quite handled since errors seems to bubble up all the way to the top?
upload did after all fail with