Systematically compare/investigate performance on Zarrs

yarikoptic commented 9 months ago

@satra did just a quick&dirty test using zarr viewer comparing access directly to S3 URL and then our testbed dandi.centerforopenneuroscience.org instance (on a zarr in 000108 under /dandisets/ not the manifests based one since no manifest for it was there yet) so there are no quantified characteristics available but

it took notably longer to load from our instance than from S3 when zooming in
overall browsing of the dav instance got significantly slower/less responsive

We need to come up with

some sample script which implements basic IO (e.g. reading the slice X) on a given zarr
time a single individual access run
time N parallel accesses to different or the same slices (assess scalability here as well)

Compare

direct S3
direct dandidav (no apache reverse proxy setup)
dandi.centerforopenneuroscience.org

to identify how much any component contributes and if we could improve any aspect (e.g. parallel access etc). I hope that it is not all simply due to redirects.

Later we need to create some nice "versioned zarr" with a few versions and to use for benchmarks and also benchmark on the /zarr endpoints.

yarikoptic commented 9 months ago

Here are preliminary results for a simple "download subtree part of zarr" using aws s3 and rclone directly from S3, vs access from dandidav's dandisets/ and then zarrs/ endpoints:

bash script

```shell #!/bin/bash set -eu ds=000108 asset=0dfbb4b4-be63-45a6-8354-347cb98fdb5b if false; then # just ot cut/paste outputs below curl --silent https://api.dandiarchive.org/api/dandisets/000108/versions/draft/assets/$asset/ | { echo -n "version="; jq -r '.digest["dandi:dandi-zarr-checksum"]'; } curl --silent https://api.dandiarchive.org/api/dandisets/000108/versions/draft/assets/$asset/ | { echo -n "zarr="; jq -r '.contentUrl[1]' | sed -e 's,.*zarr/,,g' -e 's,/,,g'; } curl --silent https://api.dandiarchive.org/api/dandisets/000108/versions/draft/assets/$asset/ | { echo -n "path="; jq -r '.path' ; } fi # too big already #zarr=04767811-3cea-43c8-956d-81da5e496f80 #version=WRONG!! #path=sub-MITU01/ses-20211002h21m40s58/micr/sub-MITU01_ses-20211002h21m40s58_sample-37_stain-YO_run-1_chunk-10_SPIM.ome.zarr # a little smaller version=0395d0a3767524377b58da3945b3c063-48379--27115470 zarr=0d5b9be5-e626-4f6a-96da-b6b602954899 path=sub-U01hm15x/ses-20220731h17m24s47/micr/sub-U01hm15x_ses-20220731h17m24s47_sample-mEhmAD031x15R2_ABETA_stain-ABETA_run-1_chunk-1_SPIM.ome.zarr part=${PART:-0/0/0/0/0/} : ${CONCUR:=20} OUT="/tmp/zarr-bm" rm -rf "$OUT"/* || : #set -x # simple download directly from S3 get_aws_s3() { aws configure set default.s3.max_concurrent_requests $CONCUR /usr/bin/time chronic aws s3 --no-sign-request sync s3://dandiarchive/zarr/$zarr/$part "$1" } get_rclone_s3() { aws configure set default.s3.max_concurrent_requests $CONCUR /usr/bin/time rclone sync --transfers $CONCUR DANDI-S3:dandiarchive/zarr/$zarr/$part "$1" } get_rclone_dandisets() { /usr/bin/time rclone sync --transfers $CONCUR DANDI-WEBDAV:dandisets/$ds/draft/$path/$part "$1" } get_rclone_zarr_manifest() { /usr/bin/time rclone sync --transfers $CONCUR DANDI-WEBDAV:zarrs/${zarr:0:3}/${zarr:3:3}/${zarr}/$version/$part "$1" } echo "Downloading part $part within zarr $zarr asking for up to $CONCUR processes" for method in get_aws_s3 get_rclone_s3 get_rclone_dandisets get_rclone_zarr_manifest; do out="$OUT/$method" echo "---------------" echo "$method: $out" $method "$out" checksum=$(TQDM_DISABLE=1 zarrsum local "$out" | tail -n 1) if [ -z "$part" ] && [ $checksum != "$version" ]; then echo "wrong checksum $checksum != $version" fi if [ -n "$part" ]; then echo "checksum $checksum" fi done ```

so we see that /dandisets/ endpoint is quite slow, but the manifests based and shorter path of /zarrs/ one is quite performant (well -- still twice slower but that is just that -- twice! ;))

(dev3) yoh@typhon:~/proj/dandi/zarr-benchmarking/tools$ PART=0/0/0/0/ CONCUR=10 ./simple_parallel_get.sh
Downloading part 0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 10 processes
---------------
get_aws_s3:  /tmp/zarr-bm/get_aws_s3
22.49user 2.03system 0:29.91elapsed 82%CPU (0avgtext+0avgdata 119156maxresident)k
0inputs+24064outputs (0major+110906minor)pagefaults 0swaps
checksum 742c8d77baf7240437d33117f1d063fb-3008--1684480
---------------
get_rclone_s3:  /tmp/zarr-bm/get_rclone_s3
3.64user 0.94system 0:16.16elapsed 28%CPU (0avgtext+0avgdata 85516maxresident)k
0inputs+24064outputs (0major+5286minor)pagefaults 0swaps
checksum 742c8d77baf7240437d33117f1d063fb-3008--1684480
---------------
get_rclone_dandisets:  /tmp/zarr-bm/get_rclone_dandisets
4.63user 1.99system 3:16.25elapsed 3%CPU (0avgtext+0avgdata 74280maxresident)k
0inputs+24064outputs (0major+5663minor)pagefaults 0swaps
checksum 742c8d77baf7240437d33117f1d063fb-3008--1684480
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
3.47user 1.55system 0:32.48elapsed 15%CPU (0avgtext+0avgdata 74440maxresident)k
0inputs+24064outputs (0major+5382minor)pagefaults 0swaps
checksum 742c8d77baf7240437d33117f1d063fb-3008--1684480

on a serial access we are up to par with aws on /zarrs/ but rclone beats us:

(dev3) yoh@typhon:~/proj/dandi/zarr-benchmarking/tools$ PART=0/0/0/0/ CONCUR=1 ./simple_parallel_get.sh
Downloading part 0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 1 processes
---------------
get_aws_s3:  /tmp/zarr-bm/get_aws_s3
20.58user 1.82system 3:06.44elapsed 12%CPU (0avgtext+0avgdata 109096maxresident)k
0inputs+24064outputs (0major+97814minor)pagefaults 0swaps
checksum 742c8d77baf7240437d33117f1d063fb-3008--1684480
---------------
get_rclone_s3:  /tmp/zarr-bm/get_rclone_s3
4.69user 1.75system 2:43.25elapsed 3%CPU (0avgtext+0avgdata 81728maxresident)k
0inputs+24064outputs (0major+6314minor)pagefaults 0swaps
checksum 742c8d77baf7240437d33117f1d063fb-3008--1684480
---------------
get_rclone_dandisets:  /tmp/zarr-bm/get_rclone_dandisets
4.86user 2.44system 12:44.14elapsed 0%CPU (0avgtext+0avgdata 71256maxresident)k
0inputs+24064outputs (0major+3675minor)pagefaults 0swaps
checksum 742c8d77baf7240437d33117f1d063fb-3008--1684480
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
4.45user 2.20system 3:04.90elapsed 3%CPU (0avgtext+0avgdata 71740maxresident)k
0inputs+24064outputs (0major+4791minor)pagefaults 0swaps
checksum 742c8d77baf7240437d33117f1d063fb-3008--1684480

the /dandisets/ path remains many times slower - we should look into optimizing there. But I guess the first thing would be to establish efficient dandi-archive API per

https://github.com/dandi/dandi-archive/issues/1837 and there on, right @jwodder ?

jwodder commented 9 months ago

@yarikoptic Yes, a dedicated dandi-archive endpoint would definitely help with efficiency.

yarikoptic commented 9 months ago

with CONCUR=20 rclone scales linearly to cutting in half from 10. rclone of dandidav remains the same -- suggests that there might be limiting factor in parallel handling of requests

```shell (dev3) yoh@typhon:~/proj/dandi/zarr-benchmarking/tools$ PART=0/0/0/0/ CONCUR=20 ./simple_parallel_get.sh Downloading part 0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 20 processes --------------- get_aws_s3: /tmp/zarr-bm/get_aws_s3 30.80user 2.00system 0:23.41elapsed 140%CPU (0avgtext+0avgdata 127672maxresident)k 0inputs+24064outputs (0major+111824minor)pagefaults 0swaps checksum 742c8d77baf7240437d33117f1d063fb-3008--1684480 --------------- get_rclone_s3: /tmp/zarr-bm/get_rclone_s3 3.90user 1.00system 0:08.26elapsed 59%CPU (0avgtext+0avgdata 91768maxresident)k 0inputs+24064outputs (0major+5026minor)pagefaults 0swaps checksum 742c8d77baf7240437d33117f1d063fb-3008--1684480 --------------- get_rclone_dandisets: /tmp/zarr-bm/get_rclone_dandisets 4.68user 1.94system 3:28.99elapsed 3%CPU (0avgtext+0avgdata 74680maxresident)k 0inputs+24064outputs (0major+5036minor)pagefaults 0swaps checksum 742c8d77baf7240437d33117f1d063fb-3008--1684480 --------------- get_rclone_zarr_manifest: /tmp/zarr-bm/get_rclone_zarr_manifest 3.79user 1.43system 0:32.42elapsed 16%CPU (0avgtext+0avgdata 77672maxresident)k 0inputs+24064outputs (1major+4009minor)pagefaults 0swaps checksum 742c8d77baf7240437d33117f1d063fb-3008--1684480 ```

For /dandisets/ -- seems to stay at about 30% CPU

```shell shub@falkor:~$ pidstat -p 3044013 1 Linux 6.1.0-10-amd64 (falkor) 02/13/2024 _x86_64_ (8 CPU) 09:15:06 AM UID PID %usr %system %guest %wait %CPU CPU Command 09:15:07 AM 1001 3044013 34.00 1.00 0.00 0.00 35.00 0 dandidav 09:15:08 AM 1001 3044013 29.00 3.00 0.00 0.00 32.00 0 dandidav 09:15:09 AM 1001 3044013 30.00 2.00 0.00 0.00 32.00 0 dandidav 09:15:10 AM 1001 3044013 29.00 2.00 0.00 0.00 31.00 0 dandidav 09:15:11 AM 1001 3044013 30.00 2.00 0.00 0.00 32.00 0 dandidav 09:15:12 AM 1001 3044013 25.00 4.00 0.00 0.00 29.00 0 dandidav 09:15:13 AM 1001 3044013 26.00 2.00 0.00 0.00 28.00 0 dandidav ```

For `/zarr/` even less

```shell shub@falkor:~$ pidstat -p 3044013 1 Linux 6.1.0-10-amd64 (falkor) 02/13/2024 _x86_64_ (8 CPU) 09:16:56 AM UID PID %usr %system %guest %wait %CPU CPU Command 09:16:57 AM 1001 3044013 9.00 1.00 0.00 0.00 10.00 0 dandidav 09:16:58 AM 1001 3044013 5.00 1.00 0.00 0.00 6.00 0 dandidav 09:16:59 AM 1001 3044013 10.00 0.00 0.00 0.00 10.00 0 dandidav 09:17:00 AM 1001 3044013 11.00 1.00 0.00 0.00 12.00 0 dandidav 09:17:01 AM 1001 3044013 9.00 0.00 0.00 0.00 9.00 0 dandidav 09:17:02 AM 1001 3044013 8.00 1.00 0.00 0.00 9.00 0 dandidav 09:17:03 AM 1001 3044013 9.00 0.00 0.00 0.00 9.00 0 dandidav 09:17:04 AM 1001 3044013 9.00 2.00 0.00 0.00 11.00 0 dandidav 09:17:05 AM 1001 3044013 9.00 2.00 0.00 0.00 11.00 0 dandidav 09:17:06 AM 1001 3044013 9.00 1.00 0.00 0.00 10.00 0 dandidav 09:17:07 AM 1001 3044013 11.00 1.00 0.00 0.00 12.00 0 dandidav 09:17:08 AM 1001 3044013 10.00 1.00 0.00 0.00 11.00 0 dandidav 09:17:09 AM 1001 3044013 10.00 2.00 0.00 0.00 12.00 0 dandidav ```

Although no %wait I wonder if logging (#69) already something effecting it.

jwodder commented 9 months ago

@yarikoptic Did you build the dandidav binary with --release? If not, you should.

satra commented 9 months ago

@yarikoptic - just a note that testing throughput could be broken into two parts (reading, reading + verifying). the former is what any zarr reader would do. it wouldn't try to check that it received the correct bytes. the latter is what dandi cli would do to ensure that we do receive the correct bytes. and the tools used above do various things depending on flags. also for parallel read from s3, s5cmd is significantly faster than most other tools (https://github.com/peak/s5cmd).

yarikoptic commented 9 months ago

@satra thanks for the note. In the script/output above I report the zarr checksum we have for all downloads -- all good as long as they match. If it is not partial download of a zarr (like in those above) -- it would explicitly compare to the target overall zarr checksum. Verification is not taken as part of the benchmark. If you have some python / any other code you would like to benchmark zarrs -- please share.

re s5cmd, I will add it then. Probably in favor over aws s3 which consistently shows that it is slow.

yarikoptic commented 9 months ago

FWIW

with default settings s5cmd with direct access to s3 at 00.48 definitely beats everything down (note: for aws s3 invocation I just set concurrency to 255) especially if I do not request any specific level of concurrency and s5cmd goes wild to its top 255 etc - it is about 3 times faster than rclone access from our dandidav. But rclone itself is slower than s5cmd in direct access to s3 - and our overhead for it is quite small (2.13 rclone on s3 vs 2.30 rclone on our manifests)

```shell yoh@typhon:~/proj/dandi/zarr-benchmarking$ tools/simple_parallel_get.sh Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 without specifying explicit number of processes --------------- get_aws_s3: /tmp/zarr-bm/get_aws_s3 TIME: 0:10.58 real 104.23 user 0.56 sys 990% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 --------------- get_s5cmd_s3: /tmp/zarr-bm/get_s5cmd_s3 TIME: 0:00.48 real 2.00 user 0.24 sys 464% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 --------------- get_rclone_s3: /tmp/zarr-bm/get_rclone_s3 TIME: 0:02.13 real 0.30 user 0.10 sys 19% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 --------------- get_rclone_dandisets: /tmp/zarr-bm/get_rclone_dandisets TIME: 0:15.81 real 0.31 user 0.15 sys 2% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 --------------- get_rclone_zarr_manifest: /tmp/zarr-bm/get_rclone_zarr_manifest TIME: 0:02.30 real 0.29 user 0.08 sys 16% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 ```

but if I give rclone explicit up to 30 concurrent processes, it improves to 00.59 and actually beats s5cmd at the same level of concurrency making it in 00.61. Unfortunately rclone access to dandidav manifests stays at ceiling of 02.12

```shell yoh@typhon:~/proj/dandi/zarr-benchmarking$ CONCUR=30 tools/simple_parallel_get.sh Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 30 processes --------------- get_aws_s3: /tmp/zarr-bm/get_aws_s3 TIME: 0:08.16 real 23.55 user 0.21 sys 290% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 --------------- get_s5cmd_s3: /tmp/zarr-bm/get_s5cmd_s3 TIME: 0:00.61 real 1.43 user 0.19 sys 262% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 --------------- get_rclone_s3: /tmp/zarr-bm/get_rclone_s3 TIME: 0:00.59 real 0.48 user 0.10 sys 99% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 --------------- get_rclone_dandisets: /tmp/zarr-bm/get_rclone_dandisets TIME: 0:12.14 real 0.46 user 0.12 sys 4% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 --------------- get_rclone_zarr_manifest: /tmp/zarr-bm/get_rclone_zarr_manifest TIME: 0:02.12 real 0.32 user 0.12 sys 21% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 ```

and indeed we reach ceiling at some where around 20 concurrent processes

```shell yoh@typhon:~/proj/dandi/zarr-benchmarking$ for c in 1 5 10 20 30 40 50; do CONCUR=$c METHODS=get_rclone_zarr_manifest tools/simple_parallel_get.sh; done Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 1 processes --------------- get_rclone_zarr_manifest: /tmp/zarr-bm/get_rclone_zarr_manifest TIME: 0:07.93 real 0.32 user 0.11 sys 5% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 5 processes --------------- get_rclone_zarr_manifest: /tmp/zarr-bm/get_rclone_zarr_manifest TIME: 0:02.21 real 0.30 user 0.12 sys 19% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 10 processes --------------- get_rclone_zarr_manifest: /tmp/zarr-bm/get_rclone_zarr_manifest TIME: 0:02.19 real 0.36 user 0.10 sys 21% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 20 processes --------------- get_rclone_zarr_manifest: /tmp/zarr-bm/get_rclone_zarr_manifest TIME: 0:02.12 real 0.36 user 0.12 sys 23% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 30 processes --------------- get_rclone_zarr_manifest: /tmp/zarr-bm/get_rclone_zarr_manifest TIME: 0:02.13 real 0.38 user 0.10 sys 23% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 40 processes ```

all prior results were with prior "debug" build. With fresh "release" build situation is qualitatively the same.

```shell yoh@typhon:~/proj/dandi/zarr-benchmarking$ for c in 1 5 10 20 30 40 50; do CONCUR=$c METHODS=get_rclone_zarr_manifest tools/simple_parallel_get.sh; done Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 1 processes --------------- get_rclone_zarr_manifest: /tmp/zarr-bm/get_rclone_zarr_manifest TIME: 0:10.93 real 0.31 user 0.16 sys 4% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 5 processes --------------- get_rclone_zarr_manifest: /tmp/zarr-bm/get_rclone_zarr_manifest TIME: 0:02.31 real 0.29 user 0.08 sys 16% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 10 processes --------------- get_rclone_zarr_manifest: /tmp/zarr-bm/get_rclone_zarr_manifest TIME: 0:02.12 real 0.33 user 0.09 sys 20% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 20 processes --------------- get_rclone_zarr_manifest: /tmp/zarr-bm/get_rclone_zarr_manifest TIME: 0:02.08 real 0.27 user 0.12 sys 19% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 30 processes --------------- get_rclone_zarr_manifest: /tmp/zarr-bm/get_rclone_zarr_manifest TIME: 0:02.08 real 0.35 user 0.09 sys 21% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 40 processes --------------- get_rclone_zarr_manifest: /tmp/zarr-bm/get_rclone_zarr_manifest TIME: 0:02.09 real 0.37 user 0.12 sys 23% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 50 processes --------------- get_rclone_zarr_manifest: /tmp/zarr-bm/get_rclone_zarr_manifest TIME: 0:02.10 real 0.33 user 0.13 sys 22% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 ```

@jwodder said that he already looked into scalability a little but have not found a resolution yet.

Given that rclone seems generally can reach s5cmd performance on access to S3, I think solving scalability solution for the /zarr/ endpoint should provide a comparable solution altogether.

yarikoptic commented 9 months ago

FWIW: I considered possibility that it is apache as reverse proxy making a limit. It is not the case: ran on falkor against apache and directly against dandidav -- the same

```shell falkor:~/proj/dandi/zarr-benchmarking $> for c in 1 5 10 20 30 40 50; do CONCUR=$c METHODS=get_rclone_zarr_manifest RCLONE_DANDI_WEBDAV=DANDI-WEBDAV-LOCAL tools/simple_parallel_get.sh; done Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 1 processes --------------- get_rclone_zarr_manifest: /tmp/zarr-bm/get_rclone_zarr_manifest TIME: 0:07.89 real 0.29 user 0.14 sys 5% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 5 processes --------------- get_rclone_zarr_manifest: /tmp/zarr-bm/get_rclone_zarr_manifest TIME: 0:02.22 real 0.27 user 0.12 sys 17% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 10 processes --------------- get_rclone_zarr_manifest: /tmp/zarr-bm/get_rclone_zarr_manifest TIME: 0:02.15 real 0.31 user 0.11 sys 19% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 20 processes --------------- get_rclone_zarr_manifest: /tmp/zarr-bm/get_rclone_zarr_manifest TIME: 0:02.10 real 0.24 user 0.17 sys 19% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 30 processes --------------- get_rclone_zarr_manifest: /tmp/zarr-bm/get_rclone_zarr_manifest TIME: 0:02.17 real 0.39 user 0.15 sys 25% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 40 processes --------------- get_rclone_zarr_manifest: /tmp/zarr-bm/get_rclone_zarr_manifest TIME: 0:02.10 real 0.36 user 0.10 sys 21% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 50 processes --------------- get_rclone_zarr_manifest: /tmp/zarr-bm/get_rclone_zarr_manifest TIME: 0:02.09 real 0.32 user 0.17 sys 23% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 (dev3.11) (git)/home/yoh/proj/dandi/zarr-benchmarking:[master] #22 !2680 [0].....................................:Wed Feb 14 08:47:45:. falkor:~/proj/dandi/zarr-benchmarking *$> for c in 1 5 10 20 30 40 50; do CONCUR=$c METHODS=get_rclone_zarr_manifest RCLONE_DANDI_WEBDAV=DANDI-WEBDAV tools/simple_parallel_get.sh; done Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 1 processes --------------- get_rclone_zarr_manifest: /tmp/zarr-bm/get_rclone_zarr_manifest TIME: 0:07.69 real 0.29 user 0.16 sys 6% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 5 processes --------------- get_rclone_zarr_manifest: /tmp/zarr-bm/get_rclone_zarr_manifest TIME: 0:02.20 real 0.26 user 0.12 sys 17% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 10 processes --------------- get_rclone_zarr_manifest: /tmp/zarr-bm/get_rclone_zarr_manifest TIME: 0:02.14 real 0.25 user 0.16 sys 19% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 20 processes --------------- get_rclone_zarr_manifest: /tmp/zarr-bm/get_rclone_zarr_manifest TIME: 0:02.11 real 0.27 user 0.20 sys 22% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 30 processes --------------- get_rclone_zarr_manifest: /tmp/zarr-bm/get_rclone_zarr_manifest TIME: 0:02.11 real 0.28 user 0.13 sys 19% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 40 processes --------------- get_rclone_zarr_manifest: /tmp/zarr-bm/get_rclone_zarr_manifest TIME: 0:02.12 real 0.24 user 0.20 sys 20% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 50 processes --------------- get_rclone_zarr_manifest: /tmp/zarr-bm/get_rclone_zarr_manifest TIME: 0:02.12 real 0.27 user 0.14 sys 19% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 ```

FWIW I have pushed current version of the helper to https://github.com/dandi/zarr-benchmarking/blob/master/tools/simple_parallel_get.sh

jwodder commented 9 months ago

@yarikoptic To be clear: You're concluding that dandidav can only support a maximum of about 20 requests based on the fact that the total wallclock runtime to process concurrent requests stops decreasing at 20, correct? If so, that interpretation seems backwards: if dandidav can't handle more than 20 concurrent requests, then sending more requests would result in some not being handled immediately, causing the total runtime to increase.

yarikoptic commented 9 months ago

correct.

I don't think so since the overall number of requests is constant (number of files in that zarr path) and the only thing changing is how many we are submitting in parallel. So it is a question of throughput really. My trivilized thinking/explanation which simplifies as in submitter not waiting for response (like we do) but overall of the same reasoning. If dandidav can handle max 20 requests in 1 sec, the time to process 100 requests would remain constant 5 seconds, regardless in how many requests (above 20) I would request at once (in parallel).

jwodder commented 9 months ago

@yarikoptic I think I misunderstood what you were doing with concurrent processes. I thought you were running, say, 20 copies of rclone against the same endpoint in parallel, but it's actually just one rclone process that internally divides its requests into concurrent batches of 20.

yarikoptic commented 9 months ago

correct! indeed it might also be valuable to test against multiple clients in parallel bombarding the same/different zarrs.

jwodder commented 9 months ago

@yarikoptic I'm going to ask about request limits on a Rust forum, and I need to know how many CPUs (not cores) the machine dandidav is running on has. I believe running python3 -c "import os; print(os.cpu_count())" should give the correct value.

jwodder commented 9 months ago

@yarikoptic Also, could you run for c in 1 5 10 20 30 40 50; do CONCUR=$c ... using the get_rclone_dandisets method instead of get_rclone_zarr_manifest? I want to confirm that both endpoints exhibit the 20-request limit with a "release" build.

yarikoptic commented 9 months ago

I believe running python3 -c "import os; print(os.cpu_count())" should give the correct value.

32

@yarikoptic Also, could you run for c in 1 5 10 20 30 40 50; do CONCUR=$c ... using the get_rclone_dandisets method instead of get_rclone_zarr_manifest?

Sure, doing against LOCAL

*$> for c in 1 5 10 20 30 40 50; do CONCUR=$c METHODS=get_rclone_zarr_manifest RCLONE_DANDI_WEBDAV=DANDI-WEBDAV-LOCAL tools/simple_parallel_get.sh; done
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 1 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:13.37 real      0.29 user       0.22 sys        3% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 5 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.47 real      0.25 user       0.17 sys        17% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 10 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.15 real      0.36 user       0.10 sys        21% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 20 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.10 real      0.30 user       0.18 sys        23% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 30 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.09 real      0.39 user       0.12 sys        24% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 40 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.14 real      0.36 user       0.14 sys        23% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280
Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 50 processes
---------------
get_rclone_zarr_manifest:  /tmp/zarr-bm/get_rclone_zarr_manifest
TIME: 0:02.10 real      0.34 user       0.15 sys        23% CPU
checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280

shub@falkor:~$ dandidav-test/dandidav --version
dandidav 0.2.0 (commit: 4998bd8)

shub@falkor:~$ file dandidav-test/dandidav
dandidav-test/dandidav: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=825b59e00515dad24f6ae389e6a79c65765232dd, for GNU/Linux 3.2.0, with debug_info, not stripped

if needed next time - to expedite, just run the script on e.g. drogon: it is now on github. I can also make account for you on either typhon (where I did originally) or falkor (webserver), just let me know.

yarikoptic commented 8 months ago

Hi @jwodder , anything from Rust community on the issue of scalability? What to look for?

jwodder commented 8 months ago

@yarikoptic No, my post got no attention. The only thing I've learned is that axum (the web framework dandidav uses) does not impose any request limits by default, so the bottleneck isn't coming from there.

yarikoptic commented 8 months ago

but what is it -- limited number of threads ?

jwodder commented 8 months ago

@yarikoptic tokio (the async runtime) does limit the number of worker threads to the number of CPUs by default, but you said that's 32, which wouldn't explain why we're seeing bottlenecks at 20. Also, since the code is async, it's possible for more than 32 requests to be processed at once anyway.

yarikoptic commented 8 months ago

is it tunable? if we double it and it doesn't have effect -- must be smth else. if not -- we could figure out why 20 and not 32 ;-)

jwodder commented 8 months ago

@yarikoptic Yes, it's tunable. If you set the environment variable TOKIO_WORKER_THREADS to an integer when running the server, that'll be how many worker threads are used.

If you're really paranoid, you can check what Rust thinks the number of CPUs is as follows:

Create a new directory on the server machine and cd into it

Create a Cargo.toml file with the following contents:

[package]
name = "cpu-count"
version = "0.1.0"
edition = "2021"

[dependencies]
num_cpus = "1.16.0"

Create a src/main.rs file with the following contents:

fn main() {
    println!("Number of CPUs: {}", num_cpus::get());
}

Run cargo run.

yarikoptic commented 8 months ago

FWIW.

Setting it to 10 caused performance ceiling at 10

on server: `TOKIO_WORKER_THREADS=10 ./dandidav` on client typhon ```shell (zarr-manifests) yoh@typhon:~/proj/dandi/zarr-benchmarking$ for c in 1 5 10 15 20 30 40 50; do CONCUR=$c METHODS=get_rclone_dandisets RCLONE_DANDI_WEBDAV=DANDI-WEBDAV tools/simple_parallel_get.sh; done Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 1 processes --------------- get_rclone_dandisets: /tmp/zarr-bm/get_rclone_dandisets TIME: 0:37.85 real 0.39 user 0.12 sys 1% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 5 processes --------------- get_rclone_dandisets: /tmp/zarr-bm/get_rclone_dandisets TIME: 0:09.88 real 0.39 user 0.16 sys 5% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 10 processes --------------- get_rclone_dandisets: /tmp/zarr-bm/get_rclone_dandisets TIME: 0:08.74 real 0.41 user 0.11 sys 6% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 15 processes --------------- get_rclone_dandisets: /tmp/zarr-bm/get_rclone_dandisets TIME: 0:08.20 real 0.43 user 0.11 sys 6% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 20 processes --------------- get_rclone_dandisets: /tmp/zarr-bm/get_rclone_dandisets TIME: 0:08.36 real 0.46 user 0.10 sys 6% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 30 processes --------------- get_rclone_dandisets: /tmp/zarr-bm/get_rclone_dandisets TIME: 0:08.44 real 0.50 user 0.18 sys 8% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 40 processes --------------- get_rclone_dandisets: /tmp/zarr-bm/get_rclone_dandisets TIME: 0:08.27 real 0.56 user 0.16 sys 8% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 50 processes --------------- get_rclone_dandisets: /tmp/zarr-bm/get_rclone_dandisets TIME: 0:08.59 real 0.83 user 0.20 sys 12% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 ```

setting to 100 - the same ceiling at 20-30

```shell (zarr-manifests) yoh@typhon:~/proj/dandi/zarr-benchmarking$ for c in 1 5 10 15 20 30 40 50; do CONCUR=$c METHODS=get_rclone_dandisets RCLONE_DANDI_WEBDAV=DANDI-WEBDAV tools/simple_parallel_get.sh; done Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 1 processes --------------- get_rclone_dandisets: /tmp/zarr-bm/get_rclone_dandisets TIME: 0:34.04 real 0.29 user 0.20 sys 1% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 5 processes --------------- get_rclone_dandisets: /tmp/zarr-bm/get_rclone_dandisets TIME: 0:09.10 real 0.37 user 0.13 sys 5% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 10 processes --------------- get_rclone_dandisets: /tmp/zarr-bm/get_rclone_dandisets TIME: 0:08.74 real 0.31 user 0.24 sys 6% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 15 processes --------------- get_rclone_dandisets: /tmp/zarr-bm/get_rclone_dandisets TIME: 0:08.34 real 0.40 user 0.13 sys 6% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 20 processes --------------- get_rclone_dandisets: /tmp/zarr-bm/get_rclone_dandisets TIME: 0:08.42 real 0.41 user 0.15 sys 6% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 30 processes --------------- get_rclone_dandisets: /tmp/zarr-bm/get_rclone_dandisets TIME: 0:08.22 real 0.51 user 0.21 sys 8% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 40 processes --------------- get_rclone_dandisets: /tmp/zarr-bm/get_rclone_dandisets TIME: 0:08.79 real 0.54 user 0.13 sys 7% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 Downloading part 0/0/0/0/0/ within zarr 0d5b9be5-e626-4f6a-96da-b6b602954899 asking for up to 50 processes --------------- get_rclone_dandisets: /tmp/zarr-bm/get_rclone_dandisets TIME: 0:08.38 real 0.87 user 0.25 sys 13% CPU checksum f494a7ab20c6906e4c176e7a0f08c29d-188--105280 ```

so there is indeed more to it

jwodder commented 8 months ago

@yarikoptic This is largely a shot in the dark, but on the server that dandidav is running on, are 20 of the CPU cores performance cores and the rest efficiency cores? (I don't know how you'd check this.) I don't know how core scheduling works with efficiency cores, but since all of dandidav's work is relatively short-lived, maybe only the performance cores are being used?

yarikoptic commented 8 months ago

I even had no clue that there are different kinds of cores. FWIW, to rule out cores effect we could check scalability while running on some other server. Could you please run on typhon which has 32 cores? ndoli has 40

jwodder commented 8 months ago

@yarikoptic

I even had no clue that there are different kinds of cores.

It's a new thing that ARM came out with in 2011, and then Intel made its own version in 2022-ish. (Thus, if the server's CPUs are from 2021 or earlier, then they're just "normal" cores and my theory is wrong.) I only found out about the different types last week or the week before when I was looking at the specs for new MacBooks.

Could you please run on typhon which has 32 cores?

I don't think you ever gave me access to typhon, and I don't know how to access it. (At the very least, typhon.datalad.org doesn't resolve for me.)

jwodder commented 8 months ago

@yarikoptic I ran both dandidav and the timing script on typhon, and the 20-request limit was still present:

get_rclone_dandisets:

Processes	Time	User	Sys	CPU
1	0:38.22	0.35	0.17	1%
5	0:10.30	0.42	0.16	5%
10	0:09.48	0.39	0.12	5%
15	0:08.83	0.37	0.22	6%
20	0:09.43	0.38	0.17	6%
30	0:09.24	0.42	0.16	6%
40	0:10.11	0.53	0.19	7%
50	0:09.68	0.53	0.20	7%

get_rclone_zarr_manifest:

Processes	Time	User	Sys	CPU
1	0:10.67	0.35	0.22	5%
5	0:02.28	0.32	0.10	18%
10	0:02.14	0.35	0.17	24%
15	0:02.10	0.42	0.10	25%
20	0:02.10	0.27	0.10	18%
30	0:02.12	0.57	0.11	32%
40	0:02.10	0.42	0.14	27%
50	0:02.09	0.37	0.13	24%

jwodder commented 6 months ago

@yarikoptic I wrote https://github.com/jwodder/batchdav for measuring WebDAV traversal time using a given number of workers. Unlike rclone, rather than spending time downloading files from S3, the only requests it makes for the actual files are HEAD requests that don't follow redirects.

I've run it at various different points, and the times from different sessions have varied by about an order of magnitude, which I suspect is due to differing amounts of load on Heroku. Some of the fastest times I've ever gotten are:

Workers	Time (mean)	Time (stddev)
1	8.399036695100001	0.36142910510463516
5	1.6700318244	0.12919592271200123
10	1.0409548316000001	0.10855610294283857
15	0.7129774931999999	0.06181837739373458
20	0.750514105	0.10966455557731183
30	0.7945123642999999	0.10238084442203854
40	0.7258895968	0.08116879741778966
50	0.7132875974999999	0.07944869527032605

Make of that what you will.

yarikoptic commented 6 months ago

Just to recall/summarize what we have so far:

We (for /zarr endpoints) seems to be (just) slightly over twice slower than the fast s5cmd (or rclone) when actually accessing some sample zarr per my much earlier investigation.
- we have not timed yet on some other use cases/access patterns besides the one crafted
For dandidav there seems to be some ceiling at about 20 workers the nature of which we do not yet know, and we do not know how deciding that factor is ATM.

jwodder commented 5 months ago

@yarikoptic I'm starting to think that the "20 workers ceiling" is just a statistical illusion resulting from the combination of two factors:

As the number of concurrent requests to an axum server grows, the average response time increases. (Based on my observations, this happens regardless of whether the number of concurrent requests is greater than or less than the number of threads used by the server.)
As the number of concurrent requests made by a client grows, assuming the response time remains constant, the overall runtime decreases.

It seems that the decrease from (2) is dominant until some point in the 20-30 concurrent requests range, after which the increase from (1) is dominant, thereby producing a minimum runtime around 20 workers.

yarikoptic commented 5 months ago

it could indeed be the case. But why and how much the response time increases for the 1.?

jwodder commented 5 months ago

@yarikoptic I'm not entirely sure why it happens, but you can see the increase in the following statistical summary of the individual request times per number of client workers when using batchdav from my MacBook Pro to traverse [1]:

``` - Workers: 1 Qty: 4700 Min: 0.028803198 Q1: 0.03833497475 Med: 0.042684092 Q3: 0.04846862325 Max: 0.282854373 Avg: 0.04520333965510638 StdDev: 0.014187492721929993 - Workers: 5 Qty: 4700 Min: 0.027815688 Q1: 0.0367942775 Med: 0.0411618505 Q3: 0.0465202755 Max: 0.282241313 Avg: 0.042935500837659575 StdDev: 0.009896549514405408 - Workers: 10 Qty: 4700 Min: 0.028312506 Q1: 0.04303246275 Med: 0.0491158515 Q3: 0.05691809825 Max: 0.160982916 Avg: 0.051420016434468085 StdDev: 0.012632252899625567 - Workers: 15 Qty: 4700 Min: 0.030884503 Q1: 0.047435413749999995 Med: 0.054248813 Q3: 0.06353448650000001 Max: 0.1791786 Avg: 0.05743126231170213 StdDev: 0.015056378924438578 - Workers: 20 Qty: 4700 Min: 0.032487009 Q1: 0.0505410675 Med: 0.057821753000000004 Q3: 0.07021392474999999 Max: 0.34045238 Avg: 0.06307433582808511 StdDev: 0.019146996600473823 - Workers: 30 Qty: 4700 Min: 0.034624271 Q1: 0.07324338025 Med: 0.09015912849999999 Q3: 0.11517259725 Max: 0.553115742 Avg: 0.10026274531425532 StdDev: 0.041730714375020364 - Workers: 40 Qty: 4700 Min: 0.036998484 Q1: 0.09305445600000001 Med: 0.1177166345 Q3: 0.14652155375 Max: 0.367216551 Avg: 0.12432899197702128 StdDev: 0.04396168177875441 - Workers: 50 Qty: 4700 Min: 0.041797039 Q1: 0.11373270075 Med: 0.14867984350000002 Q3: 0.19339987925000002 Max: 0.565099972 Avg: 0.16100923599723405 StdDev: 0.06995097492611861 ```

The effect also happens when both the dandidav server and the batchdav client are run on smaug:

``` - Workers: 1 Qty: 4700 Min: 0.001521174 Q1: 0.001634275 Med: 0.001657998 Q3: 0.0017370222499999998 Max: 0.006338718 Avg: 0.0017425304072340424 StdDev: 0.000276490867720157 - Workers: 5 Qty: 4700 Min: 0.001630908 Q1: 0.002234742 Med: 0.0026924534999999998 Q3: 0.0033815309999999997 Max: 0.008711644 Avg: 0.0029348376385106385 StdDev: 0.0009457047047697312 - Workers: 10 Qty: 4700 Min: 0.001978361 Q1: 0.00286769725 Med: 0.003462554 Q3: 0.0038105122499999997 Max: 0.01161249 Avg: 0.0035703647621276595 StdDev: 0.0010899940171207661 - Workers: 15 Qty: 4700 Min: 0.002201353 Q1: 0.0035673577499999998 Med: 0.0037393955 Q3: 0.0041512415000000006 Max: 0.011563436 Avg: 0.004090452591276596 StdDev: 0.0011780001120566517 - Workers: 20 Qty: 4700 Min: 0.002114947 Q1: 0.0042447985 Med: 0.004752021 Q3: 0.005448158 Max: 0.016708018 Avg: 0.005145343596170212 StdDev: 0.0015452715150670242 - Workers: 30 Qty: 4700 Min: 0.00217514 Q1: 0.00622427575 Med: 0.0073929024999999995 Q3: 0.00870435825 Max: 0.018373898 Avg: 0.007615053831702127 StdDev: 0.0022078254266568575 - Workers: 40 Qty: 4700 Min: 0.002216688 Q1: 0.0084486705 Med: 0.010468385 Q3: 0.012738740250000002 Max: 0.026942456 Avg: 0.010717534040851065 StdDev: 0.003395530609215751 - Workers: 50 Qty: 4700 Min: 0.002178145 Q1: 0.009653720250000001 Med: 0.012588034000000001 Q3: 0.01565948575 Max: 0.032009146 Avg: 0.012759034398085107 StdDev: 0.004308517810350476 ```

yarikoptic commented 5 months ago

interesting. So indeed it pretty much jumps twice in Avg going from 20 to 40, but not before, i.e. before 20 it grows slower than factor for number of workers (e.g. it is growing twice in going from 1 to 10). Just for academic purposes -- could you produce those Avg times only for our dandidav and smaug only but with step 1? then I would like to look at estimate of Avg[n]/Avg[n/k] (k==2 or some other) -- is it jumping rapidly higher at some specific value and either that value the same for the both, and where it crosses the horizontal line for k - i.e. were increasing number of workers would not be of help.

So indeed your observation was probably right spot on that response time is what keeps us from "going faster", but the question remains why such rapid response time growth? it could again be if there is some hardcoded size of some queue somewhere to process only N requests in parallel, and that would increase average time since N+1st request would need to wait first to get its turn, which would add up to response time.

jwodder commented 5 months ago

@yarikoptic

Just for academic purposes -- could you produce those Avg times only for our dandidav and smaug only but with step 1?

Over what range of worker values?

yarikoptic commented 5 months ago

how academic you want to go? ;) if not much -- till 50, if more -- till 100 ;)

jwodder commented 5 months ago

@yarikoptic

then I would like to look at estimate of Avg[n]/Avg[n/k] (k==2 or some other)

I'm not sure what formula you're trying to describe here. If k is a fixed number, then the given expression will always be equal to k, since Avg[n/k] = Avg[n]/k, so Avg[n]/Avg[n/k] = Avg[n]/(Avn[n]/k) = k.

jwodder commented 5 months ago

The statistics for 1 through 50 workers against webdav.dandiarchive.org are:

``` - Workers: 1 Qty: 1880 Min: 0.030198719 Q1: 0.04047590875 Med: 0.0445370265 Q3: 0.05099806525 Max: 0.505651443 Avg: 0.04770227094627659 StdDev: 0.021967394220812318 - Workers: 2 Qty: 1880 Min: 0.029291391 Q1: 0.039277515 Med: 0.0443762905 Q3: 0.05052834975 Max: 0.117624117 Avg: 0.0460287705106383 StdDev: 0.010271837123910967 - Workers: 3 Qty: 1880 Min: 0.030182589 Q1: 0.0407896065 Med: 0.0459593295 Q3: 0.05419803475 Max: 0.274188679 Avg: 0.05009355483138298 StdDev: 0.016656739601998753 - Workers: 4 Qty: 1880 Min: 0.027692972 Q1: 0.03707500075 Med: 0.0416833945 Q3: 0.04726783925 Max: 0.267129368 Avg: 0.04420503393617021 StdDev: 0.012988273800523325 - Workers: 5 Qty: 1880 Min: 0.026963886 Q1: 0.03712360225 Med: 0.041677756999999996 Q3: 0.04774004750000001 Max: 0.277971116 Avg: 0.0438017427356383 StdDev: 0.011928574725979245 - Workers: 6 Qty: 1880 Min: 0.029059732 Q1: 0.04042438225 Med: 0.0462466995 Q3: 0.055243211 Max: 0.13999787 Avg: 0.04965774111223404 StdDev: 0.013700461119878111 - Workers: 7 Qty: 1880 Min: 0.028257397 Q1: 0.038818042 Med: 0.043996682999999995 Q3: 0.0508364065 Max: 0.146299726 Avg: 0.046456366545744684 StdDev: 0.011793250847199541 - Workers: 8 Qty: 1880 Min: 0.028853611 Q1: 0.042837879 Med: 0.051541457 Q3: 0.06957287175 Max: 0.320239926 Avg: 0.05949934435265958 StdDev: 0.025074489714542465 - Workers: 9 Qty: 1880 Min: 0.028510246 Q1: 0.0403969415 Med: 0.046127439 Q3: 0.05523004225 Max: 0.492025285 Avg: 0.05122154616861702 StdDev: 0.022898344247715256 - Workers: 10 Qty: 1880 Min: 0.026451435 Q1: 0.040683591500000005 Med: 0.0464000255 Q3: 0.0539399995 Max: 0.148002937 Avg: 0.049111625025531916 StdDev: 0.012994588316983175 - Workers: 11 Qty: 1880 Min: 0.028418947 Q1: 0.04411216275 Med: 0.0509800795 Q3: 0.060620849500000004 Max: 0.277585998 Avg: 0.05454666861170213 StdDev: 0.01615620883278283 - Workers: 12 Qty: 1880 Min: 0.029657315 Q1: 0.0464608035 Med: 0.0547166485 Q3: 0.06761030975 Max: 0.271769288 Avg: 0.05995401004361702 StdDev: 0.020413754904561217 - Workers: 13 Qty: 1880 Min: 0.030319581 Q1: 0.047687136000000005 Med: 0.056807948999999996 Q3: 0.07380057725 Max: 0.274448025 Avg: 0.06486728137287234 StdDev: 0.025753669748910947 - Workers: 14 Qty: 1880 Min: 0.030179559 Q1: 0.045507504250000004 Med: 0.053886984 Q3: 0.06721529375 Max: 0.293695972 Avg: 0.06026955950478723 StdDev: 0.024470219442912906 - Workers: 15 Qty: 1880 Min: 0.030015622 Q1: 0.04543004825000001 Med: 0.053180841 Q3: 0.06633464275 Max: 0.332681016 Avg: 0.05880427830053192 StdDev: 0.021664812097799337 - Workers: 16 Qty: 1880 Min: 0.031588945 Q1: 0.04561960825 Med: 0.0529881105 Q3: 0.06584364175 Max: 0.246868099 Avg: 0.05973010918191489 StdDev: 0.023111749345733613 - Workers: 17 Qty: 1880 Min: 0.028995174 Q1: 0.0507760395 Med: 0.059060258000000004 Q3: 0.07307642375000001 Max: 0.176472585 Avg: 0.06394386569787235 StdDev: 0.019133970826324433 - Workers: 18 Qty: 1880 Min: 0.033611645 Q1: 0.047814756750000006 Med: 0.054737262999999994 Q3: 0.06728223875 Max: 0.266728307 Avg: 0.06048719055638298 StdDev: 0.020122678711662696 - Workers: 19 Qty: 1880 Min: 0.031449239 Q1: 0.05059406775 Med: 0.059452169 Q3: 0.0753107175 Max: 0.273290625 Avg: 0.06658642856702128 StdDev: 0.024967574900272436 - Workers: 20 Qty: 1880 Min: 0.031575064 Q1: 0.05169245275 Med: 0.0623448345 Q3: 0.0797893495 Max: 0.218454668 Avg: 0.06761636774840425 StdDev: 0.021800469336841686 - Workers: 21 Qty: 1880 Min: 0.030493716 Q1: 0.049088047499999996 Med: 0.058016233 Q3: 0.0733673765 Max: 0.161157289 Avg: 0.06381228032765957 StdDev: 0.02059089904339535 - Workers: 22 Qty: 1880 Min: 0.038303362 Q1: 0.063974428 Med: 0.0835617045 Q3: 0.112530386 Max: 0.404207984 Avg: 0.0964025817819149 StdDev: 0.0478784277111293 - Workers: 23 Qty: 1880 Min: 0.035781558 Q1: 0.05550457075 Med: 0.070056598 Q3: 0.090656272 Max: 0.387076747 Avg: 0.07993766085638297 StdDev: 0.037897550636719124 - Workers: 24 Qty: 1880 Min: 0.036041731 Q1: 0.060921950249999995 Med: 0.074656136 Q3: 0.0957148505 Max: 0.346587787 Avg: 0.08106884284627659 StdDev: 0.028429675142782072 - Workers: 25 Qty: 1880 Min: 0.037840619 Q1: 0.0629756705 Med: 0.07655896400000001 Q3: 0.09650261800000001 Max: 0.243017182 Avg: 0.08326741997712767 StdDev: 0.028154275468516673 - Workers: 26 Qty: 1880 Min: 0.038228939 Q1: 0.06477089224999999 Med: 0.0803242125 Q3: 0.1049138705 Max: 0.746675107 Avg: 0.09239364253031915 StdDev: 0.04400628406814501 - Workers: 27 Qty: 1880 Min: 0.036371745 Q1: 0.06456503425000001 Med: 0.082823384 Q3: 0.10855006725 Max: 0.327976212 Avg: 0.09352845995957447 StdDev: 0.041411974328631786 - Workers: 28 Qty: 1880 Min: 0.033921641 Q1: 0.06529700699999999 Med: 0.079911845 Q3: 0.098540441 Max: 0.244019412 Avg: 0.08469663869095745 StdDev: 0.026363707305355814 - Workers: 29 Qty: 1880 Min: 0.034247912 Q1: 0.0756947665 Med: 0.0973659885 Q3: 0.1295879955 Max: 0.359678624 Avg: 0.10843607427553192 StdDev: 0.046264176305897506 - Workers: 30 Qty: 1880 Min: 0.041054888 Q1: 0.08897897675 Med: 0.111961684 Q3: 0.14239367225 Max: 0.361699035 Avg: 0.11918543665638298 StdDev: 0.041927636909251303 - Workers: 31 Qty: 1880 Min: 0.037353538 Q1: 0.0744029605 Med: 0.09185067499999999 Q3: 0.11388209600000002 Max: 0.291891112 Avg: 0.09756171123829786 StdDev: 0.033404854076688355 - Workers: 32 Qty: 1880 Min: 0.03760705 Q1: 0.085931312 Med: 0.108399407 Q3: 0.13513629 Max: 0.440032619 Avg: 0.11592635626117022 StdDev: 0.043976380306986175 - Workers: 33 Qty: 1880 Min: 0.037930898 Q1: 0.07457999950000001 Med: 0.0922163005 Q3: 0.1134090805 Max: 0.399841632 Avg: 0.1001791668212766 StdDev: 0.03959930437732155 - Workers: 34 Qty: 1880 Min: 0.039043573 Q1: 0.0907591495 Med: 0.1130351855 Q3: 0.142176184 Max: 0.360311793 Avg: 0.12165001820053192 StdDev: 0.04454270684016887 - Workers: 35 Qty: 1880 Min: 0.042111625 Q1: 0.075454718 Med: 0.09126802249999999 Q3: 0.11218065899999999 Max: 0.352494531 Avg: 0.09615554100159573 StdDev: 0.028982116873632273 - Workers: 36 Qty: 1880 Min: 0.039344224 Q1: 0.08958393799999999 Med: 0.1119491465 Q3: 0.14033767500000002 Max: 0.293377311 Avg: 0.11830818654840426 StdDev: 0.03981244324362139 - Workers: 37 Qty: 1880 Min: 0.039811418 Q1: 0.08431090075 Med: 0.1039188205 Q3: 0.13125420075 Max: 0.370457522 Avg: 0.11323512460053191 StdDev: 0.042970400619024594 - Workers: 38 Qty: 1880 Min: 0.037180738 Q1: 0.090248271 Med: 0.112835429 Q3: 0.15111971925 Max: 0.373409753 Avg: 0.1269275020468085 StdDev: 0.05279329914936619 - Workers: 39 Qty: 1880 Min: 0.039293444 Q1: 0.11518510925 Med: 0.14848806399999998 Q3: 0.179011966 Max: 0.415970775 Avg: 0.1506312693547872 StdDev: 0.049322983206494066 - Workers: 40 Qty: 1880 Min: 0.046026323 Q1: 0.11352183775000001 Med: 0.14302828 Q3: 0.17975541675 Max: 0.370786007 Avg: 0.15082671499095743 StdDev: 0.053878775153732364 - Workers: 41 Qty: 1880 Min: 0.042243285 Q1: 0.1109422555 Med: 0.14818105850000002 Q3: 0.19478194499999998 Max: 0.506666957 Avg: 0.15950820534840426 StdDev: 0.06547950686216591 - Workers: 42 Qty: 1880 Min: 0.037987138 Q1: 0.09977411575 Med: 0.126900079 Q3: 0.16130051475 Max: 0.414916882 Avg: 0.1339623592744681 StdDev: 0.049423841060615835 - Workers: 43 Qty: 1880 Min: 0.037800664 Q1: 0.09326244 Med: 0.11499287050000001 Q3: 0.147206116 Max: 0.351152881 Avg: 0.12314373566170214 StdDev: 0.04392976003292658 - Workers: 44 Qty: 1880 Min: 0.045228382 Q1: 0.11149409525000001 Med: 0.142151954 Q3: 0.17733455124999997 Max: 0.352633191 Avg: 0.1458120733218085 StdDev: 0.048546879938201996 - Workers: 45 Qty: 1880 Min: 0.032151294 Q1: 0.09642264975 Med: 0.124894909 Q3: 0.16097414975 Max: 0.370443691 Avg: 0.13475005337234042 StdDev: 0.05526900201009612 - Workers: 46 Qty: 1880 Min: 0.035793279 Q1: 0.1005359305 Med: 0.12990985900000002 Q3: 0.16520698749999999 Max: 0.402757271 Avg: 0.1377827691856383 StdDev: 0.05330886453400441 - Workers: 47 Qty: 1880 Min: 0.034641088 Q1: 0.10163916175 Med: 0.128542445 Q3: 0.16163195425 Max: 0.417836856 Avg: 0.13393576528404255 StdDev: 0.04574783805801064 - Workers: 48 Qty: 1880 Min: 0.038120426 Q1: 0.1009265035 Med: 0.1216764585 Q3: 0.14816999349999999 Max: 0.33581898 Avg: 0.12660648065159574 StdDev: 0.04069829660003819 - Workers: 49 Qty: 1880 Min: 0.039849426 Q1: 0.1212875695 Med: 0.1510706935 Q3: 0.186415799 Max: 0.416485898 Avg: 0.15878518579680853 StdDev: 0.06036032166784821 - Workers: 50 Qty: 1880 Min: 0.037548742 Q1: 0.10892622575 Med: 0.1389205055 Q3: 0.17040354549999998 Max: 0.3649731 Avg: 0.14284811705425532 StdDev: 0.04865791195576917 ```

and the statistics for 1 through 50 workers against dandidav run on smaug are:

``` - Workers: 1 Qty: 1880 Min: 0.001263773 Q1: 0.00133388425 Med: 0.0013593185000000002 Q3: 0.0015099735000000001 Max: 0.004031561 Avg: 0.0014336062680851064 StdDev: 0.00021230805297713868 - Workers: 2 Qty: 1880 Min: 0.001304008 Q1: 0.0023371572499999997 Med: 0.0027662885 Q3: 0.0028487119999999998 Max: 0.308872887 Avg: 0.0029223692627659576 StdDev: 0.009984066924141 - Workers: 3 Qty: 1880 Min: 0.001296164 Q1: 0.00183014175 Med: 0.0023113185 Q3: 0.0027977139999999998 Max: 0.007494042 Avg: 0.0024289806186170216 StdDev: 0.0007714845810454107 - Workers: 4 Qty: 1880 Min: 0.001476163 Q1: 0.0017672965 Med: 0.002210318 Q3: 0.0027398375 Max: 0.006795677 Avg: 0.0023590983409574467 StdDev: 0.0007249617108132787 - Workers: 5 Qty: 1880 Min: 0.001531298 Q1: 0.0017849205000000001 Med: 0.0021271855 Q3: 0.00267039975 Max: 0.006667797 Avg: 0.0023463242468085106 StdDev: 0.0007397465174556249 - Workers: 6 Qty: 1880 Min: 0.001524057 Q1: 0.001845738 Med: 0.0021813795 Q3: 0.002759644 Max: 0.00867191 Avg: 0.0023938246590425533 StdDev: 0.0007519526091665974 - Workers: 7 Qty: 1880 Min: 0.001562128 Q1: 0.0019453505 Med: 0.002354564 Q3: 0.00287174825 Max: 0.00764177 Avg: 0.002533195656382979 StdDev: 0.0007906696561381343 - Workers: 8 Qty: 1880 Min: 0.001589142 Q1: 0.00203439075 Med: 0.0025405005 Q3: 0.0029089575 Max: 0.008614692 Avg: 0.0026349355132978727 StdDev: 0.0008139821589503481 - Workers: 9 Qty: 1880 Min: 0.001621635 Q1: 0.0020699185 Med: 0.0025790609999999997 Q3: 0.0029408827499999997 Max: 0.008686138 Avg: 0.0026694448521276597 StdDev: 0.000870264843030878 - Workers: 10 Qty: 1880 Min: 0.001612252 Q1: 0.0021259549999999997 Med: 0.0026734535 Q3: 0.002990397 Max: 0.011281332 Avg: 0.0027496460813829786 StdDev: 0.0009604458237714645 - Workers: 11 Qty: 1880 Min: 0.001662213 Q1: 0.00232134425 Med: 0.00281172 Q3: 0.0031252265 Max: 0.008985129 Avg: 0.002875568825 StdDev: 0.0008067738217647319 - Workers: 12 Qty: 1880 Min: 0.001654777 Q1: 0.0024010327499999996 Med: 0.00286444 Q3: 0.0032087217500000004 Max: 0.008042452 Avg: 0.0029810785728723406 StdDev: 0.0009001125875987497 - Workers: 13 Qty: 1880 Min: 0.001753488 Q1: 0.00269221475 Med: 0.0029310225 Q3: 0.003244117 Max: 0.008923116 Avg: 0.003159238369680851 StdDev: 0.0010739580608455101 - Workers: 14 Qty: 1880 Min: 0.001878984 Q1: 0.00279802825 Med: 0.0030029695 Q3: 0.00335819025 Max: 0.009186735 Avg: 0.003239842380851064 StdDev: 0.0010454811829535239 - Workers: 15 Qty: 1880 Min: 0.001881073 Q1: 0.00289385625 Med: 0.0030782995000000002 Q3: 0.0033833510000000006 Max: 0.009639233 Avg: 0.0033117405170212766 StdDev: 0.0009538718940503049 - Workers: 16 Qty: 1880 Min: 0.001758127 Q1: 0.0029827680000000002 Med: 0.0031780425 Q3: 0.0036075400000000002 Max: 0.00985623 Avg: 0.0035179418771276597 StdDev: 0.00106202138386058 - Workers: 17 Qty: 1880 Min: 0.001698211 Q1: 0.00309828975 Med: 0.0033287755 Q3: 0.0037702165 Max: 0.00956896 Avg: 0.003620856455851064 StdDev: 0.000950433325226505 - Workers: 18 Qty: 1880 Min: 0.001733253 Q1: 0.00321529975 Med: 0.0035158245 Q3: 0.0040366855 Max: 0.010607861 Avg: 0.0038342121420212767 StdDev: 0.0010568011060368978 - Workers: 19 Qty: 1880 Min: 0.001679107 Q1: 0.0033450877499999997 Med: 0.0037434225 Q3: 0.0042829305 Max: 0.010970623 Avg: 0.004054301981914894 StdDev: 0.0011403780273799113 - Workers: 20 Qty: 1880 Min: 0.001753481 Q1: 0.0035197832499999997 Med: 0.0039864215 Q3: 0.00463226675 Max: 0.01182554 Avg: 0.004275437218085106 StdDev: 0.0012002224551602373 - Workers: 21 Qty: 1880 Min: 0.00170803 Q1: 0.0037598575 Med: 0.004227930499999999 Q3: 0.0048891495 Max: 0.011399548 Avg: 0.004492716238829787 StdDev: 0.0012513384522316816 - Workers: 22 Qty: 1880 Min: 0.00184426 Q1: 0.00383840575 Med: 0.0044025755 Q3: 0.00506287225 Max: 0.012582417 Avg: 0.004631916492553192 StdDev: 0.0012674463353114294 - Workers: 23 Qty: 1880 Min: 0.001799269 Q1: 0.00411439525 Med: 0.0047580585000000005 Q3: 0.00549100925 Max: 0.0133126 Avg: 0.005019725092553191 StdDev: 0.0014947137003477953 - Workers: 24 Qty: 1880 Min: 0.001767635 Q1: 0.004360623 Med: 0.0051616675 Q3: 0.00609550975 Max: 0.021617576 Avg: 0.005581546595212766 StdDev: 0.001995528306263226 - Workers: 25 Qty: 1880 Min: 0.001696459 Q1: 0.00432315025 Med: 0.005074832499999999 Q3: 0.0060352800000000005 Max: 0.016029885 Avg: 0.005448785329787234 StdDev: 0.0018234227957253304 - Workers: 26 Qty: 1880 Min: 0.001738747 Q1: 0.00450427125 Med: 0.00537091 Q3: 0.0063196929999999995 Max: 0.015939861 Avg: 0.0057183237005319154 StdDev: 0.0019740240416537933 - Workers: 27 Qty: 1880 Min: 0.001740717 Q1: 0.004693919749999999 Med: 0.005576348 Q3: 0.006506601250000001 Max: 0.021465816 Avg: 0.005882277896808511 StdDev: 0.0019466468031291949 - Workers: 28 Qty: 1880 Min: 0.00176789 Q1: 0.0048308875000000005 Med: 0.005758545 Q3: 0.006819377749999999 Max: 0.015323992 Avg: 0.006031835842021276 StdDev: 0.001922897442174265 - Workers: 29 Qty: 1880 Min: 0.001696134 Q1: 0.00486328175 Med: 0.005882376999999999 Q3: 0.007093499750000001 Max: 0.01394583 Avg: 0.006147665209042553 StdDev: 0.0019238022404711732 - Workers: 30 Qty: 1880 Min: 0.001908575 Q1: 0.0051919192500000004 Med: 0.0061346535 Q3: 0.007290996500000001 Max: 0.01582853 Avg: 0.006376735376595745 StdDev: 0.0019019806167858422 - Workers: 31 Qty: 1880 Min: 0.001891961 Q1: 0.005276604 Med: 0.006309998000000001 Q3: 0.00776528825 Max: 1.161150329 Avg: 0.02563116205425532 StdDev: 0.14664862957074778 - Workers: 32 Qty: 1880 Min: 0.001629076 Q1: 0.00513431675 Med: 0.006151182 Q3: 0.007334462 Max: 0.014651206 Avg: 0.006374087170212766 StdDev: 0.001916222117081965 - Workers: 33 Qty: 1880 Min: 0.001858345 Q1: 0.00533344625 Med: 0.006365368 Q3: 0.00757438675 Max: 0.014003605 Avg: 0.0065586491207446805 StdDev: 0.0019488889528209194 - Workers: 34 Qty: 1880 Min: 0.001690778 Q1: 0.00527693425 Med: 0.006455962 Q3: 0.00794573675 Max: 0.014477802 Avg: 0.00667195309468085 StdDev: 0.0020118379439124324 - Workers: 35 Qty: 1880 Min: 0.001682999 Q1: 0.0054949974999999995 Med: 0.0066973365 Q3: 0.0081161295 Max: 0.014337319 Avg: 0.006880291247872341 StdDev: 0.0020261844816378692 - Workers: 36 Qty: 1880 Min: 0.001451109 Q1: 0.005782872499999999 Med: 0.006866042 Q3: 0.00836848425 Max: 0.015716399 Avg: 0.007094151496808511 StdDev: 0.002074590570273339 - Workers: 37 Qty: 1880 Min: 0.001468595 Q1: 0.00572000025 Med: 0.0071595165 Q3: 0.008752421 Max: 0.015978881 Avg: 0.007264743660106383 StdDev: 0.0021887910912431413 - Workers: 38 Qty: 1880 Min: 0.001503958 Q1: 0.005949213 Med: 0.0074373765000000005 Q3: 0.00881852275 Max: 0.015739134 Avg: 0.007506660809574468 StdDev: 0.0022989601972854966 - Workers: 39 Qty: 1880 Min: 0.001568519 Q1: 0.00611301875 Med: 0.007520967 Q3: 0.00924958425 Max: 0.0175203 Avg: 0.0076871529292553194 StdDev: 0.00237760051469743 - Workers: 40 Qty: 1880 Min: 0.00155 Q1: 0.00613054425 Med: 0.007776119 Q3: 0.009510827500000001 Max: 1.341618803 Avg: 0.03601654922287234 StdDev: 0.1913100155478866 - Workers: 41 Qty: 1880 Min: 0.001700998 Q1: 0.00625324975 Med: 0.007943563 Q3: 0.009707434 Max: 0.017197804 Avg: 0.008015607229255319 StdDev: 0.0025232677616616943 - Workers: 42 Qty: 1880 Min: 0.001610776 Q1: 0.006439681 Med: 0.0081648185 Q3: 0.009795644999999999 Max: 0.017197947 Avg: 0.00819962987712766 StdDev: 0.0025295954104922855 - Workers: 43 Qty: 1880 Min: 0.001788056 Q1: 0.0066565795 Med: 0.008349584 Q3: 0.01030028425 Max: 0.017926146 Avg: 0.00848668465957447 StdDev: 0.002687127405740674 - Workers: 44 Qty: 1880 Min: 0.0015614 Q1: 0.00687974825 Med: 0.008691787 Q3: 0.010593080250000001 Max: 0.01789089 Avg: 0.008757082086702127 StdDev: 0.0027471290834987358 - Workers: 45 Qty: 1880 Min: 0.001745542 Q1: 0.00683847725 Med: 0.008724561499999998 Q3: 0.010632159 Max: 0.017958523 Avg: 0.008861858163829787 StdDev: 0.0028737972963138346 - Workers: 46 Qty: 1880 Min: 0.001713159 Q1: 0.00691737225 Med: 0.009000072500000001 Q3: 0.011032118250000002 Max: 0.020088445 Avg: 0.009054291571808511 StdDev: 0.002923192755706995 - Workers: 47 Qty: 1880 Min: 0.001718873 Q1: 0.007218515 Med: 0.009135034 Q3: 0.01111018125 Max: 0.01960762 Avg: 0.00922941064787234 StdDev: 0.0029166310775548205 - Workers: 48 Qty: 1880 Min: 0.00177512 Q1: 0.006972514000000001 Med: 0.0091963275 Q3: 0.01164765225 Max: 0.019551087 Avg: 0.009358817180319149 StdDev: 0.0031469010699963773 - Workers: 49 Qty: 1880 Min: 0.001708601 Q1: 0.00712719575 Med: 0.009461824 Q3: 0.011794431250000001 Max: 0.018784388 Avg: 0.009477265148404255 StdDev: 0.003119737553731081 - Workers: 50 Qty: 1880 Min: 0.001716787 Q1: 0.0072522212499999995 Med: 0.009557792 Q3: 0.01166624275 Max: 0.0233712 Avg: 0.009536564675000001 StdDev: 0.0031595471975162073 ```

Would you like me to provide the raw data, containing each of the individual request times?

yarikoptic commented 5 months ago

here what I get for k=2

and smaug

assuming that excursion is noise (in particular for smaug at 15), it remains under 2, so in principle should still scale up although very slowly, and indeed that ratio plato is around around 12-15 in both cases, i.e. somewhere at 24-30 we reach so that ratio of response time at N becomes 1.8 of response time at N/2 parallel requests.

if zoom in at smaug one

we see that it pretty much grows linearly until then. So there is IMHO some inherent factor, possibly dependent on instance (what is # of processors/cores on heroku) but not clearly so, which just ceils scalability there.,

yarikoptic commented 5 months ago

BTW -- while running on smaug what do you see for the CPU load on dandidav (since no actual download is involved I guess could differ from prior stated)?

jwodder commented 5 months ago

@yarikoptic The load steadily increased as the number of workers increased, capping out at around 4.2 for the 5-second load average.

yarikoptic commented 5 months ago

what about cpu%? ie is that task becoming somehow IO bound vs CPU?

jwodder commented 5 months ago

@yarikoptic I ran watch -n1 'ps -q "$(pgrep -d, dandidav)" -o cuu,args' while running batchdav against dandidav on smaug, and within a few second the CPU percentage was at 99.999%, where it remained for the rest of batchdav's run.

yarikoptic commented 5 months ago

so may be the ceiling is due to bottleneck in the implementation stack somewhere :-/ Are there convenient profiling tools in the rust land to see what it is mostly "thinking about"?

jwodder commented 5 months ago

@yarikoptic There are a number of profilers. I'm starting out with flamegraph, but I'm getting the error:

Error:
Access to performance monitoring and observability operations is limited.
Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open
access to performance monitoring and observability operations for processes
without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability.
More information can be found at 'Perf events and tool security' document:
https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
perf_event_paranoid setting is 3:
  -1: Allow use of (almost) all events by all users
      Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>= 0: Disallow raw and ftrace function tracepoint access
>= 1: Disallow CPU event access
>= 2: Disallow kernel profiling
To make the adjusted perf_event_paranoid setting permanent preserve it
in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>)
failed to sample program

For the record, flamegraph's docs suggest setting perf_event_paranoid to -1.

yarikoptic commented 5 months ago

Should be all set (but not for every user; and not persistent across reboots), you were added to the perf_users, so either re-login or start a new shell using newgrp perf_users.

jwodder commented 5 months ago

@yarikoptic Side note: From the flamegraph, I've discovered that the functionality for logging how much memory dandidav uses (added for #118) is taking up a lot of time; when I remove the functionality, requests to dandidav speed up by an order of magnitude. However, response times still increase with the number of concurrent requests.

jwodder commented 5 months ago

@yarikoptic I ended up posting a question about this on axum's repository: https://github.com/tokio-rs/axum/discussions/2772

dandi / dandidav

Systematically compare/investigate performance on Zarrs #54