RedisLabs / redis-cluster-proxy

A proxy for Redis clusters.
GNU Affero General Public License v3.0
993 stars 132 forks source link

Error processing commands streams using pipelines -> triggering error: Failed to parse multibulk query: '$' not found! #21

Closed bsergean closed 4 years ago

bsergean commented 4 years ago

I have a command that write (indirectly) into redis and trigger this error, which trigger a hang. When running against 'stock redis' I don't experience the hang.

I will try to enable debug logs and capture those logs.

bsergean commented 4 years ago

Here's a log that contain the error:

        - --disable-multiplexing
        - always
        - --threads
        - "1"
        - --log-level
        - debug
        - --dump-queries

Log

[2020-01-17 00:49:28] Multiple commands 1, splitting request 0:2:150...
[2020-01-17 00:49:28] Created Request 0:2:151
[2020-01-17 00:49:28] Processing request 0:2:150
[2020-01-17 00:49:28] Request 0:2:150 written to node 172.25.207.90:6379, adding it to pending requests
[2020-01-17 00:49:28] Still have 0 request(s) to send to node 172.25.207.90:6379 on private connection owned by 0:2
[2020-01-17 00:49:28] Parsing request 0:2:151, status: -1
[2020-01-17 00:49:28] Req. 0:2:151 ARGV[0]: 'XADD'
[2020-01-17 00:49:28] Req. 0:2:151 ARGV[1]: 'ADdc208AB26B911F4cB05bFeedD8EbBA::niso_engine_rml_navigation_id'
[2020-01-17 00:49:28] Req. 0:2:151 ARGV[2]: 'MAXLEN'
[2020-01-17 00:49:28] Req. 0:2:151 ARGV[3]: '~'
[2020-01-17 00:49:28] Req. 0:2:151 ARGV[4]: '1000'
[2020-01-17 00:49:28] Req. 0:2:151 ARGV[5]: '*'
[2020-01-17 00:49:28] Req. 0:2:151 ARGV[6]: 'json'
[2020-01-17 00:49:28] Req. 0:2:151 ARGV[7]: '{"action":"rtm/publish","body": EDITED...BIG BLOB OF DATA
[2020-01-17 00:49:28] Multiple commands 1, splitting request 0:2:151...
[2020-01-17 00:49:28] Created Request 0:2:152
[2020-01-17 00:49:28] Processing request 0:2:151
[2020-01-17 00:49:28] Request 0:2:151 written to node 172.30.217.227:6379, adding it to pending requests
[2020-01-17 00:49:28] Still have 0 request(s) to send to node 172.30.217.227:6379 on private connection owned by 0:2
[2020-01-17 00:49:28] Parsing request 0:2:152, status: -1
[2020-01-17 00:49:28] Req. 0:2:152 ARGV[0]: 'XADD'
[2020-01-17 00:49:28] Req. 0:2:152 ARGV[1]: 'ADdc208AB26B911F4cB05bFeedD8EbBA::subscriber_republished_v1_neo'
[2020-01-17 00:49:28] Req. 0:2:152 ARGV[2]: 'MAXLEN'
[2020-01-17 00:49:28] Req. 0:2:152 ARGV[3]: '~'
[2020-01-17 00:49:28] Req. 0:2:152 ARGV[4]: '1000'
[2020-01-17 00:49:28] Reading reply from 172.25.207.90:6379 on thread 0...
[2020-01-17 00:49:28] Failed to parse multibulk query: '$' not found!
artix75 commented 4 years ago

@bsergean Could you send me the query you're writing to the proxy?

bsergean commented 4 years ago

It's a python script which is running the 'queries', using many different connections. I could try to run this against a single instance and capture all the queries using redis-cli MONITOR maybe.

I'll try to find a way. Maybe I'll start with writing up steps using the python thing (cobra).

(python) Server talks to redis (python) Client talks to the python server.

On Jan 17, 2020, at 8:33 AM, artix notifications@github.com wrote:

@bsergean https://github.com/bsergean Could you send me the query you're writing to the proxy?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/artix75/redis-cluster-proxy/issues/21?email_source=notifications&email_token=AC2O6UNBYAKQOQKF23GD3KTQ6HMUVA5CNFSM4KH6H7L2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJIHRBA#issuecomment-575699076, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC2O6UILZDEUBBBCXRMZDS3Q6HMUVANCNFSM4KH6H7LQ.

artix75 commented 4 years ago

@bsergean On the proxy, you can use the --dump-buffer options to log the raw query string in the proxy's log.

bsergean commented 4 years ago

Trying to capture the buffer now with --dump-buffer. The log file is very large and it's quite slow ... I gave up as it looks like it's not terminating, and re-ran without logging to see if it would terminate, but it doesn't.

As a note redis pipelines are used.

This is what strace thinks the proxy is doing. The client still try to publish things and does not terminate.

~ $ strace -f -p 1
strace: Process 1 attached with 9 threads
[pid    13] epoll_pwait(321,  <unfinished ...>
[pid    12] epoll_pwait(270,  <unfinished ...>
[pid    11] epoll_pwait(219,  <unfinished ...>
[pid    10] epoll_pwait(168,  <unfinished ...>
[pid     9] epoll_pwait(117,  <unfinished ...>
[pid     1] epoll_pwait(4,  <unfinished ...>
[pid    14] epoll_pwait(372,  <unfinished ...>
[pid     8] epoll_pwait(66, 
artix75 commented 4 years ago

@bsergean There's was a bug in the query parser, it has been fixed in commit 8a9ed40. Try to update the branch and let me know if the problem persists.

bsergean commented 4 years ago

Awesome, I'll give it a try today.

bsergean commented 4 years ago

So it worked 'the first time' / then I re-ran the python script, cancelled it, and now it hangs again ... will test more, but there might be a bug left.

BTW something unreleated, is it normal that PROXY INFO reads broken when I type it in a redis-cli interpreter ?

127.0.0.1:7777> PROXY INFO
"#Proxy\r\nproxy_version:999.999.999\r\nproxy_git_sha1:00000000\r\nproxy_git_dirty:0\r\nos:Linux 3.10.0-957.10.1.el7.x86_64 x86_64\r\ngcc_version:9.2.0\r\nprocess_id:1\r\nthreads:8\ntcp_port:7777\r\nuptime_in_seconds:19012\r\nuptime_in_days:0\r\nconfig_file:\r\nacl_user:default\r\n\r\n#Clients\r\nconnected_clients:7\r\n\r\n#Cluster\r\naddress:172.16.18.181\r\nentry_node:172.16.18.181:6379"
artix75 commented 4 years ago

So you had again the 'Failed to parse multibulk query: '$' not found'? How do I replicate this bug?

bsergean commented 4 years ago

Hi artix75 / thanks for keeping on trying ... here are steps to setup the server/client I'm using. It won't necessarily repro the bugs, but it's a start. Some of the problems I see are when I run this code in our openshift/kubernet server, where there is more latency than when I run on localhost. I also try to run those in docker-compose.

Setup / I'm on a mac but this should work on Linux too (and Windows)

  1. Install latest python (3.8.1) / then

git clone https://github.com/machinezone/cobra.git cd cobra python3 -m venv venv source venv/bin/activate make dev

Terminal one

cobra init # this will create a ~/.cobra.yaml file. cobra --verbose run --redis_urls redis://localhost:7777 --no_stats

Terminal 2 (this is the command that does the one XREAD then XADD. I have changed it to work the other way around

cobra health

Another terminal if you want

cobra publish (send one message, which translates into a XADD)

cobra publish --batch is the command I type to send lots of messages, but it uses a file containing some (a bit) private data which I prefer not to put on GitHub. I should make a public version of it, or maybe I can send it to you privately somehow.

If you want to make a docker container you can type

export DOCKER_REPO=whoami make docker'


Note that I've been working for the last few days in a branch, which uses another python client library that support redis-cluster natively, but which has some bugs that have not been merged, and is very recent code. I'm trying this approach too as I'd like to choose the best solution for the job.

the branch is feature/aredis

On Jan 23, 2020, at 11:37 PM, artix notifications@github.com wrote:

So you had again the 'Failed to parse multibulk query: '$' not found'? How do I replicate this bug?

As for the PROXY INFO: redis-cli apply special formatting to the output of the standard INFO command of Redis. But, since it doesn't know PROXY INFO, the output is just a raw string. I'll make a PR to redis-cli about it in the next days.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/artix75/redis-cluster-proxy/issues/21?email_source=notifications&email_token=AC2O6ULHGOLD6KJRDRHL7S3Q7KLCBA5CNFSM4KH6H7L2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJZ7GDY#issuecomment-578024207, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC2O6UJKOLMC3Z2NAKFWIMDQ7KLCBANCNFSM4KH6H7LQ.

artix75 commented 4 years ago

@bsergean Ok, thank you, but let me understand: are you getting the "hang" because of the Failed to parse multibulk query: '$' not found error you had before updating the branch, or because of the blocking XREAD before XADD?

Because in the first case it's about the parsing bug that should have been fixed by the commit 8a9ed40. Otherwise, in the second case, we're talking about another kind of issue (the one we were talking about in issue #18).

bsergean commented 4 years ago

I couldn't see the multi bulk query error again, so maybe it's another problem, and the parsing is all good now.

The thing that I was reproducible ONLY with redis-cli, recently, was:

  1. start a cluster
  2. start a proxy
  3. start redis-cli against the proxy
  4. kill the cluster, let the proxy running
  5. in the redis-cli editor, try to do a set foo bar / that was hanging for me, while I would have expected an error.

On Jan 24, 2020, at 10:45 AM, artix notifications@github.com wrote:

@bsergean https://github.com/bsergean Ok, thank you, but let me understand: are you getting the "hang" because of the Failed to parse multibulk query: '$' not found error you had before updating the branch, or because of the blocking XREAD before XADD?

Because in the first case it's about the parsing bug that should have been fixed by the commit 8a9ed40 https://github.com/artix75/redis-cluster-proxy/commit/8a9ed40df02862103de7c81e6f3f618dc3b39911. Otherwise, in the second case, we're talking about another kind of issue (the one we were talking about in issue #18 https://github.com/artix75/redis-cluster-proxy/issues/18).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/artix75/redis-cluster-proxy/issues/21?email_source=notifications&email_token=AC2O6UPNG4PH7TPDLNG7M4DQ7MZLTA5CNFSM4KH6H7L2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ3W3IA#issuecomment-578252192, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC2O6UOR3FMHGII6JTFOWADQ7MZLTANCNFSM4KH6H7LQ.

bsergean commented 4 years ago

I was trying to sent you a 'here is all the thing I've been testing' kind of page, that was confusing.

In my use case:

cobra health -> would trigger the XREAD / XADD thing cobra publish --batch -> would trigger the multibulk query pb.

FYI below here is an example config file which configure one app / appkey to use batching, which will translate into having the server use redis pipelines.

batch_publish_size: 50 apps: _pubsub: batch_publish: true roles: pubsub: permissions:

On Jan 24, 2020, at 10:45 AM, artix notifications@github.com wrote:

@bsergean https://github.com/bsergean Ok, thank you, but let me understand: are you getting the "hang" because of the Failed to parse multibulk query: '$' not found error you had before updating the branch, or because of the blocking XREAD before XADD?

Because in the first case it's about the parsing bug that should have been fixed by the commit 8a9ed40 https://github.com/artix75/redis-cluster-proxy/commit/8a9ed40df02862103de7c81e6f3f618dc3b39911. Otherwise, in the second case, we're talking about another kind of issue (the one we were talking about in issue #18 https://github.com/artix75/redis-cluster-proxy/issues/18).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/artix75/redis-cluster-proxy/issues/21?email_source=notifications&email_token=AC2O6UPNG4PH7TPDLNG7M4DQ7MZLTA5CNFSM4KH6H7L2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ3W3IA#issuecomment-578252192, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC2O6UOR3FMHGII6JTFOWADQ7MZLTANCNFSM4KH6H7LQ.

artix75 commented 4 years ago

@bsergean Does cobra use pubsub? It's not supported by the proxy.

bsergean commented 4 years ago

No it's using redis streams now. It used to use PubSub but not anymore.

On Jan 24, 2020, at 4:28 PM, artix notifications@github.com wrote:

@bsergean https://github.com/bsergean Does cobra use pubsub? It's not supported by the proxy.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/artix75/redis-cluster-proxy/issues/21?email_source=notifications&email_token=AC2O6UO3MN3HK64LP65GJH3Q7OBTTA5CNFSM4KH6H7L2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ4PSIQ#issuecomment-578353442, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC2O6UND2JDJVFBQEWSDIWDQ7OBTTANCNFSM4KH6H7LQ.

artix75 commented 4 years ago

@bsergean Ok, since the issue is no more related to the parsing bug (Failed to parse multibulk query: '$' not found), I'm closing this issue. Since it seems that the new issue you're talking about could be related to the XREAD/XADD sequence, please let's keep talking about it in issue #18.

bsergean commented 4 years ago

Sure sounds good.

On Jan 26, 2020, at 3:44 AM, artix notifications@github.com wrote:

Closed #21 https://github.com/artix75/redis-cluster-proxy/issues/21.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/artix75/redis-cluster-proxy/issues/21?email_source=notifications&email_token=AC2O6UO46VD3RPOCK3YSGOTQ7VZQHA5CNFSM4KH6H7L2YY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGOWGOQOPY#event-2979858239, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC2O6UPAFQIVWBPI3M3ZFRLQ7VZQHANCNFSM4KH6H7LQ.