Open Joseph-R opened 7 years ago
Do you have the dynomite logs? I see you have only Redis logs. Dynomite should output if there were errors in connection to its peers.
Hey @shailesh33 - I posted some sample dynomite logs above, at log level 9. They weren't super clearly marked, which I have gone back and fixed.
But basically there's not much helpful in the dynomite logs that I'm seeing. Just that -ERR Peer: Unknown error -1
.
at log 9 you should have much details logs. For example take this:
Try to trace the message id 1387871
and check its history in the logs. Try to get other related messages.
Best, do you have particular request that is failing and try to get the logs starting from that request till its failure?
Hey @shailesh33 - Appreciate the questions, and sorry for the late response. I have been OOO this week.
Pretty sure that was log level 9, but I will wipe the logs and try to provide an example here tracing one specific message id through.
Wiped log files and set up a new session, bumping the log level up to 11.
root@ip-10-10-2-87:/mnt/log# ls
core dynomite.log dynomite-manager.log redis-sentinel.log redis-server.log redis-server.log.1.gz redis-server.log.2.gz
# Search all redis, dynomite, and dynomite-manager logs for lines referencing message ID 552459 +/- four lines before/after.
root@ip-10-10-2-87:/mnt/log# ag -C4 552459
dynomite.log
11203-[2017-10-02 21:53:40.267] msg_dump:645 msg dump id 552458 request 1 len 14 type 87 done 0 error 0 (err 0)
11204-[2017-10-02 21:53:40.267] msg_dump:655 mbuf with 14 bytes of data
11205-00000000 2a 31 0d 0a 24 34 0d 0a 50 49 4e 47 0d 0a |*1..$4..PING..|
11206-[2017-10-02 21:53:40.267] msg_send_chain:1133 About to dump out the content of msg
11207:[2017-10-02 21:53:40.267] msg_dump:645 msg dump id 552459 request 0 len 7 type 130 done 0 error 0 (err 0)
11208-[2017-10-02 21:53:40.267] msg_dump:655 mbuf with 7 bytes of data
11209-00000000 2b 50 4f 4e 47 0d 0a |+PONG..|
11210-[2017-10-02 21:53:40.267] client_unref_internal_try_put:95 <CLIENT 0x1ad3550 -1 from '127.0.0.1:49914'> unref owner <POOL 0x1ac2080 'dyn_o_mite'>
11211-[2017-10-02 21:53:46.782] client_unref_internal_try_put:95 <CLIENT 0x1ad3550 -1 from '10.10.2.236:14537'> unref owner <POOL 0x1ac2080 'dyn_o_mite'>
Entire log file from new session (one host): dynomite_log_export.txt
@Joseph-R are you able to reproduce this problem locally ? can you provide us all of 4 dynomite.yml configuration files ?
@KowalczykBartek - Please see the 4 dynomite.yml config files, attached.
10-10-7-135.dynomite.yml.txt 10-10-2-187.dynomite.yml.txt 10-10-2-87.dynomite.yml.txt 10-10-5-233.dynomite.yml.txt
are you able to reproduce this problem locally ?
Not sure what you mean by that? I can reproduce it while calling to the cluster from a local client through the ELB, and by commands executed directly on one host and directed at the ports dynomite is listening on.
so i just went though a full upgrade to 0.5.9 to see if it would solve the proxy close issue for me, however it created this issue. after a bunch of troubleshooting. I determined i had two issues, one replication was broken, and two, if any node was not using 0 as the starting token, it gave me
(error) ERR Peer: Unknown error -1
after i reverted back, the error went away. maybe that will help.
and two, if any node was not using 0 as the starting token, it gave me (error) ERR Peer: Unknown error -1
after i reverted back, the error went away. maybe that will help.
@drewbeer - So you fixed the issue by reverting back to an earlier version of Dynomite?
Still seeing this issue after wiping out the original test cluster and deploying a new cluster with the dynomite-v0.5.9-4_DecryptionCrash
.
Versions:
Dynomite: dynomite-v0.5.9-2_ProxyCloseFix
Redis server: 3.0.6
Redis tools: 3.0.6
Redis-cli output (4-node cluster, behind ELB on AWS):
~$ /usr/local/Cellar/redis/4.0.2/bin/redis-cli -p 8102 -h internal-jreid-redis-dev-949117426.us-east-1.elb.amazonaws.com
internal-jreid-redis-dev-949117426.us-east-1.elb.amazonaws.com:8102> get test
(error) ERR Peer: Unknown error -1
internal-jreid-redis-dev-949117426.us-east-1.elb.amazonaws.com:8102> get test
(error) ERR Peer: Unknown error -1
internal-jreid-redis-dev-949117426.us-east-1.elb.amazonaws.com:8102> get test
(nil)
(0.51s)
internal-jreid-redis-dev-949117426.us-east-1.elb.amazonaws.com:8102> get test
(nil)
internal-jreid-redis-dev-949117426.us-east-1.elb.amazonaws.com:8102> get test
(nil)
internal-jreid-redis-dev-949117426.us-east-1.elb.amazonaws.com:8102> get test
(nil)
internal-jreid-redis-dev-949117426.us-east-1.elb.amazonaws.com:8102> set test testy
OK
internal-jreid-redis-dev-949117426.us-east-1.elb.amazonaws.com:8102> set booga boo
OK
internal-jreid-redis-dev-949117426.us-east-1.elb.amazonaws.com:8102> get test
"testy"
internal-jreid-redis-dev-949117426.us-east-1.elb.amazonaws.com:8102> get booga
"boo"
internal-jreid-redis-dev-949117426.us-east-1.elb.amazonaws.com:8102> set poke mon
OK
internal-jreid-redis-dev-949117426.us-east-1.elb.amazonaws.com:8102> get poke
"mon"
internal-jreid-redis-dev-949117426.us-east-1.elb.amazonaws.com:8102> set bowl poke
(error) ERR Peer: Unknown error -1
internal-jreid-redis-dev-949117426.us-east-1.elb.amazonaws.com:8102> get bowl
(error) ERR Peer: Unknown error -1
internal-jreid-redis-dev-949117426.us-east-1.elb.amazonaws.com:8102> set bowl poke
(error) ERR Peer: Unknown error -1
internal-jreid-redis-dev-949117426.us-east-1.elb.amazonaws.com:8102> set bowl poke
(error) ERR Peer: Unknown error -1
Further documentation:
@Joseph-R So to confirm, this worked with dynomite-v0.5.9-2_ProxyCloseFix
and not working with dynomite-v0.5.9-4_DecryptionCrash
?
@shailesh33 - I wish. Unfortunately no, we haven't been able to get either dynomite-v0.5.9-2_ProxyCloseFix
or dynomite-v0.5.9-4_DecryptionCrash
to work for us.
I tried to reproduce the issue but it is not helping. I created a single rack cluster with 2 nodes with the two token 0 & 2147483647. It seems straightforward to reproduce. I am using the dev branch.
Can you do this, take one of the dynomite nodes, keep log level 9 and then use redis-cli to run those commands that you saw failing. you should see logs like these:
[2017-11-02 14:26:26.124] req_forward:860 >>>>>>>>>>>>>>>>>>>>>>> <CONN_CLIENT 0x1d4e080 16 from '127.0.0.1:59996'> RECEIVED <REQ 0x1d4e2b0 1:0::0 REQ_REDIS_GET, len:23> key 'bowl' tagged key 'bowl'
[2017-11-02 14:26:26.125] _msg_get:319 alloc_msg_count: 2 caller: rsp_get <CONN_SERVER 0x1d3fd00 9 to '127.0.0.1:22122:1'>
[2017-11-02 14:26:26.126] server_rsp_forward:834 <CONN_CLIENT 0x1d4e080 16 from '127.0.0.1:59996'> <REQ 0x1d4e2b0 1:0::0 REQ_REDIS_GET, len:23> RECEIVED <RSP 0x1d4e540 2:0 RSP_REDIS_BULK len:10>
[2017-11-02 14:26:26.126] msg_local_one_rsp_handler:1007 <REQ 0x1d4e2b0 1:0::0 REQ_REDIS_GET, len:23> SELECTED <RSP 0x1d4e540 2:0 RSP_REDIS_BULK len:10>
heres how I interpret the logs:
the request enters the coordinator node and first log it has is >>>>>>
in it.
There it says which client connection received which request. In the example above, <REQ 0x1d4e2b0 1:0::0 REQ_REDIS_GET, len:23>
, in which 1:0::0
says request it is 1, parent id is 0 and fragment id is 0. If this message creates a clone to forward to a peer, that cloned message will have a new request id and parent ID would be 1 (the request id of its parent). if you follow this, you will get a flow of the messages for a given request.
The logs that you uploaded till now do not have any of these log lines so I am wondering what is the issue.
Issue: I see -ERR Peer: Unknown when writing to a Dynomite cluster.
Test method: Manual reads and writes via CLI to Dynomite ports, or testing sereis via redis-benchmark with random k:v pairs.
Versions:
Redis-cli output (4-node cluster behind an AWS ELB):
Example logs - dynomite.log (log level 9):
dynomite.yml:
Additional notes:
curl -s http://localhost:22222/cluster_describe
on each node shows the other three nodes configured.