HENNGE / aiodynamo

Asynchronous, fast, pythonic DynamoDB Client
https://aiodynamo.readthedocs.io/
Other
75 stars 21 forks source link

Test against Scylla alternator #27

Open dimaqq opened 4 years ago

dimaqq commented 4 years ago

https://github.com/scylladb/scylla/issues/5796#issuecomment-606975027

ScyllaDB is a fast reimplementation of Cassandra, and they have a dynamodb compatibility layer called alternator.

I've ran some basic tests against their docker image. It would be awesome to run a performance test now that aiodynamo is so much faster :)

ojii commented 4 years ago

enjoy https://github.com/HENNGE/aiodynamo/tree/master/benchmarks/query

dimaqq commented 3 years ago

I came. I saw. I failed. https://github.com/scylladb/scylla/issues/9240

dimaqq commented 3 years ago

It's possible to run ScyllaDB Alternator in a container, if one uses the standard (non-nightly) build. However, running Scylla in a container is painful on macOS, because roughly:

What can be done?

  1. get developer or trial account with hosted ScyllaDB and alternator, and thus test against prod / fast db
  2. run Scylla container on custom network and run benchmarks from another container in same custom network
kittyandrew commented 3 years ago

I've ran some basic tests against their docker image. It would be awesome to run a performance test now that aiodynamo is so much faster :)

@dimaqq Have you ran tests with aiodynamo? Can you elaborate on the version (of both) you've been using?

I've discovered aiodynamo recently and enjoyed moving code from slow boto3 and it was great so far, but now I've tried to test scylladb with dynamodb api, and aiodynamo code just doesn't work (while my boto3 code works fine).

Example of the issue I have: boto3 script from scylladb docs (works fine):

import boto3
dynamodb = boto3.resource('dynamodb',endpoint_url='http://localhost:8000',
                  region_name='None', aws_access_key_id='None', aws_secret_access_key='None')

dynamodb.batch_write_item(RequestItems={
    'usertable': [{'PutRequest': {
        'Item': { 'key': 'test', 'x' : {'hello': 'world'} }
    }}]
})

And now aiodynamo code replicating example above (doesn't work):

import asyncio
from aiohttp import ClientSession

from aiodynamo.client import Client, URL
from aiodynamo.credentials import Credentials
from aiodynamo.http.aiohttp import AIOHTTP
from aiodynamo.expressions import HashKey

async def main():
    async with ClientSession() as session:
        client = Client(AIOHTTP(session), Credentials.auto(), region="None", endpoint=URL("http://localhost:8000"))

        await client.put_item("usertable", item={"key": "test", "x": {"hello": "world"}})

asyncio.run(main())

It hangs for a while and gives this output:

Traceback (most recent call last):
  File "/tmp/anjfssa/.venv/lib/python3.9/site-packages/aiodynamo/client.py", line 871, in send_request
    async for _ in self.throttle_config.attempts():
  File "/tmp/anjfssa/.venv/lib/python3.9/site-packages/aiodynamo/models.py", line 238, in attempts
    raise Throttled()
aiodynamo.errors.Throttled

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/anjfssa/aiodynamo_write.py", line 16, in <module>
    asyncio.run(main())
  File "/usr/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/tmp/anjfssa/aiodynamo_write.py", line 14, in main
    await client.put_item("usertable", item={"key": "test", "x": {"hello": "world"}})
  File "/tmp/anjfssa/.venv/lib/python3.9/site-packages/aiodynamo/client.py", line 598, in put_item
    resp = await self.send_request(action="PutItem", payload=payload)
  File "/tmp/anjfssa/.venv/lib/python3.9/site-packages/aiodynamo/client.py", line 906, in send_request
    raise failed
  File "/tmp/anjfssa/.venv/lib/python3.9/site-packages/aiodynamo/client.py", line 886, in send_request
    return await self.http.post(
  File "/tmp/anjfssa/.venv/lib/python3.9/site-packages/aiodynamo/http/aiohttp.py", line 54, in post
    return cast(
  File "/usr/lib/python3.9/contextlib.py", line 135, in __exit__
    self.gen.throw(type, value, traceback)
  File "/tmp/anjfssa/.venv/lib/python3.9/site-packages/aiodynamo/http/aiohttp.py", line 22, in wrap_errors
    raise RequestFailed()
aiodynamo.http.base.RequestFailed

My system is Ubuntu 21.04, I ran ScyllaDB in the docker-compose, and I've tried scylladb/scylla-nightly:latest, scylladb/scylla:latest, scylladb/scylla:4.4.4 and scylladb/scylla:4.3.6. All of them work with boto3, but none work with aiodynamo. I couldn't see anything meaningful in scylladb logs during the request.

I just installed fresh version of boto3 and aiodynamo[aiohttp] from pypi for this repro.

Any ideas?

dimaqq commented 3 years ago

Credentials.auto tried to pick your credentials from the environment and failing that from the magical AWS URL, etc. You probably don't want that. the boto sample has aws_access_key_id='None', aws_secret_access_key='None' note these are strings, not Python None. I would not be surprised if you have to replicate these settings for aiodynamo.

I think Throttled is a red herring, see #102 the actual error is not shown 🙈

kittyandrew commented 3 years ago

Hm, I'm not sure about that. On the one hand, you are right, in the repro-case I don't have any environment configured. But in my actual application I have gimmick AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY values loaded to the environment. And it still didn't work.

I'll add it to the repro case to make sure it's not an issue.

ojii commented 3 years ago

scylla returns the wrong (or at least a different) mimetype in JSON responses, and the aiohttp adaptor fails due to that. Either use the httpx adaptor or change the aiohttp one to ignore mimetypes.

With scylladb/scylla-nightly:latest, tests pass except for two:

kittyandrew commented 3 years ago

... change the aiohttp one to ignore mimetypes.

When you say that do you mean something like https://github.com/HENNGE/aiodynamo/blob/master/src/aiodynamo/http/aiohttp.py#L57, or something else? I'm trying to understand how easy is that to tweak, and whether it's worth it for me.

ojii commented 3 years ago

When you say that do you mean something like https://github.com/HENNGE/aiodynamo/blob/master/src/aiodynamo/http/aiohttp.py#L57, or something else? I'm trying to understand how easy is that to tweak, and whether it's worth it for me.

change that to content_type=None.

kittyandrew commented 3 years ago

Eh, that still didn't make it work for me. Running scylladb/scylla-nightly:latest with --alternator-port=8000 --alternator-write-isolation=always --smp=1.

ojii commented 3 years ago

I run it with docker run --name scylla -p 8087:8000 scylladb/scylla-nightly:latest --alternator-port=8000 --alternator-write-isolation=always (I use port 8087 for my test db) and then just ran the test suite against it (after changing the aiohttp adaptor)

kittyandrew commented 3 years ago

Ohh, woops. I think it was my fault. At some point when tweaking repro code I changed something in the url, and the throttling error didn't help.

Can confirm httpx and tweaked aiohttp works for me. Thank you again, by the way. I really didn't expect to resolve this quickly :)

I guess I will stick with httpx though, if aiohttp fix won't be in the lib, since I really have no desire to maintain my fork for this.

kittyandrew commented 3 years ago

...or not. After pluging httpx into my application, I've seen that request time went from 0.03s with aiohttp adapter to 0.7s with httpx.

All I changed was from

from aiodynamo.http.aiohttp import AIOHTTP  # @nocheckin: fork required for this to work.
from aiohttp import ClientSession

self.aioclient = ClientSession()
self.aiodynamo = Client(AIOHTTP(self.aioclient), Credentials.auto(), self.region, endpoint=URL(self.db_url))

to

from aiodynamo.http.httpx import HTTPX
from httpx import AsyncClient

self.aioclient = AsyncClient()
self.aiodynamo = Client(HTTPX(self.aioclient), Credentials.auto(), self.region, endpoint=URL(self.db_url))

And these numbers (0.03s and 0.7-1s) are similar both for scylladb and dynamodb-local for me, so I guess that's another issue. I just haven't used httpx before. Am I doing something terribly wrong here?

dimaqq commented 3 years ago

I can't see anything obviously wrong here.

~1s response times are pretty bad 😱

Off the top of my head, I'd consider two aspects:

nyh commented 3 years ago

I can't see anything obviously wrong here.

~1s response times are pretty bad scream

Off the top of my head, I'd consider two aspects:

* `httpx` may have different connection pool defaults than `aiohttp` (though defaults seem sane in the docs thinking )

* `httpx` can also do HTTP/2 which `aiohttp` cannot, however, HTTP/2 is hard/rare without TLS, and if you are running server in Docker, I don't think you have certs, so probably no HTTP/2 either.

Alternator does not support HTTP 2 (neither does DynamoDB or DynamoDB local, by the way), so I doubt that's related.

This is a wild guess but unexplained fraction-second delays could be bad interaction between Naggle's algorithm and delayed ack:

  1. For some reason, the client sends the request in two write() system calls (e.g., perhaps sends the headers and the body in two separate system calls). The first system call immediately generates a packet and it is sent. The second packet is then delayed by Nagle's Algorithm until the first packet is acknowledged (the TCP wrongly hopes that until that time, the client will have sent even more data, and it can all be combined into a single packet).
  2. However, the server's TCP stack has the "delayed ack" feature - it does not ack the first packet util some time has passed, hoping it can combine multiple acks or even piggyback an ack on the response. Because the server doesn't send an ack, the client can't send the second packet - i.e., the end of the request.

You can verify in wireshark if the timing makes sense for this explanation. If it's this problem, you can try setting the TCP_NODELAY option on the client's socket - to disable Nagle's algorithm. Even more efficient is to use TCP_CORK - to explicitly tell the kernel to only send one packet after several write system calls. I don't know httpx can do any of that, or maybe it already does - it and not this problem.

dimaqq commented 3 years ago

Actually... I have used aiodynamo+httpx in the past, and the performance was fine. The dev use was against dynalite and prod use was against aws dynamo. I'll re-test the combo with newest library versions.

kittyandrew commented 3 years ago

Okay, I think the slowness is my fault (my profiler's).

I was using cProfile for my code, and when I switched from sync boto3 to async aiodynamo, I didn't think much about asyncio causing issues with the profiler. So basically something inside the profiler is slowing down everything async by an order of magnitude, and even more so for httpx.

(Haven't tested with httpx though, since I have no need for it as aiohttp is fixed in the scylla now)

dimaqq commented 2 years ago

I gave it a go, again... but I'm a bit stuck:

dimaqq commented 2 years ago

DEBUG:aiodynamo:sending request Request(url=URL('http://localhost:8000'), body=b'{"TableName":"test","Item":{"test":{"S":"test"},"quux":{"S":"sample-0"},"field-0":{"S":"value-0"},"field-1":{"S":"value-1"},"field-2":{"S":"value-2"},"field-3":{"S":"value-3"},"field-4":{"S":"value-4"},"field-5":{"S":"value-5"},"field-6":{"S":"value-6"},"field-7":{"S":"value-7"},"field-8":{"S":"value-8"},"field-9":{"S":"value-9"},"field-10":{"S":"value-10"},"field-11":{"S":"value-11"},"field-12":{"S":"value-12"},"field-13":{"S":"value-13"},"field-14":{"S":"value-14"},"field-15":{"S":"value-15"},"field-16":{"S":"value-16"},"field-17":{"S":"value-17"},"field-18":{"S":"value-18"},"field-19":{"S":"value-19"},"field-20":{"S":"value-20"},"field-21":{"S":"value-21"},"field-22":{"S":"value-22"},"field-23":{"S":"value-23"},"field-24":{"S":"value-24"},"field-25":{"S":"value-25"},"field-26":{"S":"value-26"},"field-27":{"S":"value-27"},"field-28":{"S":"value-28"},"field-29":{"S":"value-29"},"field-30":{"S":"value-30"},"field-31":{"S":"value-31"},"field-32":{"S":"value-32"},"field-33":{"S":"value-33"},"field-34":{"S":"value-34"},"field-35":{"S":"value-35"},"field-36":{"S":"value-36"},"field-37":{"S":"value-37"},"field-38":{"S":"value-38"},"field-39":{"S":"value-39"},"field-40":{"S":"value-40"},"field-41":{"S":"value-41"},"field-42":{"S":"value-42"},"field-43":{"S":"value-43"},"field-44":{"S":"value-44"},"field-45":{"S":"value-45"},"field-46":{"S":"value-46"},"field-47":{"S":"value-47"},"field-48":{"S":"value-48"},"field-49":{"S":"value-49"},"field-50":{"S":"value-50"},"field-51":{"S":"value-51"},"field-52":{"S":"value-52"},"field-53":{"S":"value-53"},"field-54":{"S":"value-54"},"field-55":{"S":"value-55"},"field-56":{"S":"value-56"},"field-57":{"S":"value-57"},"field-58":{"S":"value-58"},"field-59":{"S":"value-59"},"field-60":{"S":"value-60"},"field-61":{"S":"value-61"},"field-62":{"S":"value-62"},"field-63":{"S":"value-63"},"field-64":{"S":"value-64"},"field-65":{"S":"value-65"},"field-66":{"S":"value-66"},"field-67":{"S":"value-67"},"field-68":{"S":"value-68"},"field-69":{"S":"value-69"},"field-70":{"S":"value-70"},"field-71":{"S":"value-71"},"field-72":{"S":"value-72"},"field-73":{"S":"value-73"},"field-74":{"S":"value-74"},"field-75":{"S":"value-75"},"field-76":{"S":"value-76"},"field-77":{"S":"value-77"},"field-78":{"S":"value-78"},"field-79":{"S":"value-79"},"field-80":{"S":"value-80"},"field-81":{"S":"value-81"},"field-82":{"S":"value-82"},"field-83":{"S":"value-83"},"field-84":{"S":"value-84"},"field-85":{"S":"value-85"},"field-86":{"S":"value-86"},"field-87":{"S":"value-87"},"field-88":{"S":"value-88"},"field-89":{"S":"value-89"},"field-90":{"S":"value-90"},"field-91":{"S":"value-91"},"field-92":{"S":"value-92"},"field-93":{"S":"value-93"},"field-94":{"S":"value-94"},"field-95":{"S":"value-95"},"field-96":{"S":"value-96"},"field-97":{"S":"value-97"},"field-98":{"S":"value-98"},"field-99":{"S":"value-99"}},"ReturnValues":"NONE"}')

and then

DEBUG:aiodynamo:request failed

nyh commented 2 years ago

@dimaqq what does "request failed" mean? Was there an HTTP error? What was the content of the HTTP reply?

dimaqq commented 2 years ago

Right, here's a qiuck fix to get benchmarks running:

                     await response.json(
-                        content_type="application/x-amz-json-1.0", encoding="utf-8"
+                        content_type=None, encoding="utf-8"
                     ),

Looks like ScyllaDB Alternator returns different MIME type than Amazon DynamoDB cc @nyh

nyh commented 2 years ago

@dimaqq I thought I already fixed the mime type (https://github.com/scylladb/scylla/issues/9554) - which version of Scylla are you using? Can you please verify with "docker pull" that you are using a recent version, not some version that was called "latest" a year ago?

dimaqq commented 2 years ago

Scylla version 4.5.3-0.20211223.c8f14886d with build-id 9a5b504c51cbe8feb1217517d6977f7793b2971e starting ...

It's what gets pulled today: docker pull scylladb/scylla:latest though docker images reports that it's 5 weeks old.

dimaqq commented 2 years ago

Query performance (against single node, running in Docker, backed by host mount-bound volume, Linux laptop SSD)

row/s: 9066.190242153862
MB/s: 20.90234811938829

So, officially faster than DynamoDB (global tables, max provisioning) which topped out at ~5K rows/s IIRC.

nyh commented 2 years ago

My mime-type fix reached 4.5 only 4 weeks ago (https://github.com/scylladb/scylla/commit/5d7064e00e83a205881dcc3dd0354b8830b6cef8) so this version is not recent enough for this fix. Can you please try the nightly version? docker pull scylladb/scylla-nightly:latest

dimaqq commented 2 years ago

Nightly cannot start up at the moment, with the same arguments as normal build:

Scylla version 5.0.dev-0.20220201.00a9326ae with build-id 13ed134794204a18277ef3aacd075fb9070c81b7 starting ...
command used: "/usr/bin/scylla --log-to-syslog 1 --log-to-stdout 0 --default-log-level info --network-stack posix --developer-mode=1 --smp 1 --overprovisioned --listen-address 172.27.0.2 --rpc-address 172.27.0.2 --seed-provider-parameters seeds=172.27.0.2 --alternator-address 172.27.0.2 --alternator-port 8000 --alternator-write-isolation always --blocked-reactor-notify-ms 999999999 --skip-wait-for-gossip-to-settle 0"
parsed command line options: [log-to-syslog, (positional) 1, log-to-stdout, (positional) 0, default-log-level, (positional) info, network-stack, (positional) posix, developer-mode: 1, smp, (positional) 1, overprovisioned, listen-address: 172.27.0.2, rpc-address: 172.27.0.2, seed-provider-parameters: seeds=172.27.0.2, alternator-address: 172.27.0.2, alternator-port: 8000, alternator-write-isolation: always, blocked-reactor-notify-ms, (positional) 999999999, skip-wait-for-gossip-to-settle: 0]
2022-02-03 00:04:01,874 INFO exited: scylla-server (exit status 1; not expected)
Traceback (most recent call last):
  File "/opt/scylladb/scripts/libexec/scylla-housekeeping", line 196, in <module>
    args.func(args)
  File "/opt/scylladb/scripts/libexec/scylla-housekeeping", line 122, in check_version
    current_version = sanitize_version(get_api('/storage_service/scylla_release_version'))
  File "/opt/scylladb/scripts/libexec/scylla-housekeeping", line 80, in get_api
    return get_json_from_url("http://" + api_address + path)
  File "/opt/scylladb/scripts/libexec/scylla-housekeeping", line 75, in get_json_from_url
    raise RuntimeError(f'Failed to get "{path}" due to the following error: {retval}')
RuntimeError: Failed to get "http://localhost:10000/storage_service/scylla_release_version" due to the following error: <urlopen error [Errno 99] Cannot assign requested address>
2022-02-03 00:04:05,736 INFO spawned: 'scylla-server' with pid 117
Scylla version 5.0.dev-0.20220201.00a9326ae with build-id 13ed134794204a18277ef3aacd075fb9070c81b7 starting ...
command used: "/usr/bin/scylla --log-to-syslog 1 --log-to-stdout 0 --default-log-level info --network-stack posix --developer-mode=1 --smp 1 --overprovisioned --listen-address 172.27.0.2 --rpc-address 172.27.0.2 --seed-provider-parameters seeds=172.27.0.2 --alternator-address 172.27.0.2 --alternator-port 8000 --alternator-write-isolation always --blocked-reactor-notify-ms 999999999 --skip-wait-for-gossip-to-settle 0"
parsed command line options: [log-to-syslog, (positional) 1, log-to-stdout, (positional) 0, default-log-level, (positional) info, network-stack, (positional) posix, developer-mode: 1, smp, (positional) 1, overprovisioned, listen-address: 172.27.0.2, rpc-address: 172.27.0.2, seed-provider-parameters: seeds=172.27.0.2, alternator-address: 172.27.0.2, alternator-port: 8000, alternator-write-isolation: always, blocked-reactor-notify-ms, (positional) 999999999, skip-wait-for-gossip-to-settle: 0]
2022-02-03 00:04:06,143 INFO exited: scylla-server (exit status 1; not expected)
2022-02-03 00:04:07,144 INFO gave up: scylla-server entered FATAL state, too many start retries too quickly

I think I've seen this before, IIRC that's due to Scylla refusing to listen to [::] or 0.0.0.0 because then the node doesn't know own name which would be bad in a cluster. Too bad that it hurts devex plenty 😢

nyh commented 2 years ago

I can't reproduce the above failure. I got a slightly newer nightly, but it worked:

$ docker run --name scylla -d -p 8000:8000 scylladb/scylla-nightly:latest --alternator-port=8000 --alternator-write-isolation=always
$ docker logs scylla |& less
...
Scylla version 5.0.dev-0.20220203.d309a8670 with build-id d3f2c9395a10f04bb59997
9dc3cb18f2b0ac4648 starting ...
...
$ curl http://localhost:8000/
healthy: localhost:8000

@syuu1228 does this scylla-housekeeping error seem familiar? Could it explain why scylla-server is not coming up?

I don't think Scylla is listening on 0.0.0.0 - why/where would it do that? A different problem might be that Scylla insists to listen on port 10000 (the REST API) by default on 127.0.0.1 - not the address you give it. Maybe that's a problem in your docker setup somehow (it works on mine...). You can try to override the REST API address with the "--api-address" option and see if it changes anything.