basho / riak-erlang-client

The Riak client for Erlang.
Apache License 2.0
311 stars 188 forks source link

riakc_pb_socket.get_index causes process crash [JIRA: CLIENTS-1022] #325

Closed mikrofusion closed 7 years ago

mikrofusion commented 7 years ago

Hi,

First off, thanks for all the great work you all are doing.

I've been spiking on Riak / erlang / Elixir for work as we are hoping to these technologies for a large upcoming project. I seem to have hit a blocker that I need help with.

When attempting to this client to get search secondary indexes I end up getting a process crash. I can run the exact same command via HTTP and get the correct results. Looking for any insight on what might be causing this.

I am running Riak in a docker container locally based off this image:

https://hub.docker.com/r/basho/riak-kv/

The crash occurs when using develop and 2.4.1 version of the riak-erlang-client.


HTTP working example:

curl localhost:28098/buckets/bucket/index/name_bin/foobar
{"keys":[]}

Failing protobuf client example:

Erlang/OTP 19 [erts-8.0.2] [source] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false] [dtrace]

Eshell V8.0.2  (abort with ^G)
1> {ok, Pid} = riakc_pb_socket:start_link("127.0.0.1", 28087).
{ok,<0.59.0>}
2> riakc_pb_socket:ping(Pid).
pong
3> riakc_pb_socket:get_index(Pid, <<"bucket">>, {binary_index, "name"}, <<"foobar">>).
{error,<<"Error processing incoming message: error:{case_clause,\n                                          {rpbindexre"...>>}
4>
=ERROR REPORT==== 25-Oct-2016::19:48:35 ===
** Generic server <0.59.0> terminating
** Last message in was {tcp_closed,#Port<0.673>}
** When Server state == {state,"127.0.0.1",28087,false,false,undefined,false,
                               gen_tcp,undefined,
                               {[],[]},
                               1,[],infinity,undefined,undefined,undefined,
                               undefined,[],100}
** Reason for termination ==
** disconnected
** exception error: disconnected
4>

Crash log from Riak my docker container: http://localhost:28098/admin/#/cluster/default/ops/nodes/riak@172.17.0.2/logs/crash.log

Offender:   [{pid,<0.1806.1>},{name,undefined},{mfargs,{riak_api_pb_server,start_link,undefined}},{restart_type,temporary},{shutdown,brutal_kill},{child_type,worker}]
Reason:     {error,{case_clause,{rpbindexreq,<<"bucket">>,<<"name_bin">>,eq,<<"foobar">>,undefined,undefined,undefined,false,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined}},[{riak_kv_pb_index,decode,2,[{file,"src/riak_kv_pb_index.erl"},{line,62}]},{riak_api_pb_server,connected,2,[{file,"src/riak_api_pb_server.erl"},{line,219}]},{riak_api_pb_server,decode_buffer,2,[{file,"src/riak_api_pb_server.erl"},{line,364}]},{gen_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{line,505}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
Context:    child_terminated
Supervisor: {local,riak_api_pb_sup}
2016-10-26 02:48:35 =SUPERVISOR REPORT====
neighbours:
reductions: 6874
stack_size: 27
heap_size: 987
status: running
trap_exit: false
dictionary: []
links: [<0.312.0>,#Port<0.414523>]
messages: []
ancestors: [riak_api_pb_sup,riak_api_sup,<0.305.0>]
exception exit: {{error,{case_clause,{rpbindexreq,<<"bucket">>,<<"name_bin">>,eq,<<"foobar">>,undefined,undefined,undefined,false,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined}},[{riak_kv_pb_index,decode,2,[{file,"src/riak_kv_pb_index.erl"},{line,62}]},{riak_api_pb_server,connected,2,[{file,"src/riak_api_pb_server.erl"},{line,219}]},{riak_api_pb_server,decode_buffer,2,[{file,"src/riak_api_pb_server.erl"},{line,364}]},{gen_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{line,505}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]},[{gen_fsm,terminate,7,[{file,"gen_fsm.erl"},{line,622}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
registered_name: []
pid: <0.1806.1>
initial call: riak_api_pb_server:init/1
crasher:
2016-10-26 02:48:35 =CRASH REPORT====
** {error,{case_clause,{rpbindexreq,<<"bucket">>,<<"name_bin">>,eq,<<"foobar">>,undefined,undefined,undefined,false,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined}},[{riak_kv_pb_index,decode,2,[{file,"src/riak_kv_pb_index.erl"},{line,62}]},{riak_api_pb_server,connected,2,[{file,"src/riak_api_pb_server.erl"},{line,219}]},{riak_api_pb_server,decode_buffer,2,[{file,"src/riak_api_pb_server.erl"},{line,364}]},{gen_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{line,505}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
** Reason for termination =
**      Data  == {state,{gen_tcp,inet},#Port<0.414523>,undefined,[{riak_api_basic_pb_service,undefined},{riak_core_pb_bucket,undefined},{riak_core_pb_bucket_type,undefined},{riak_kv_pb_bucket,{state,{riak_client,['riak@172.17.0.2',undefined]},undefined,undefined}},{riak_kv_pb_counter,{state,{riak_client,['riak@172.17.0.2',undefined]}}},{riak_kv_pb_crdt,{state,{riak_client,['riak@172.17.0.2',undefined]},undefined,undefined,undefined,undefined,undefined,undefined,undefined}},{riak_kv_pb_csbucket,{state,{riak_client,['riak@172.17.0.2',undefined]},undefined,undefined,undefined,0}},{riak_kv_pb_index,{state,{riak_client,['riak@172.17.0.2',undefined]},undefined,undefined,undefined,0}},{riak_kv_pb_mapred,{state,undefined,undefined}},{riak_kv_pb_object,{state,{riak_client,['riak@172.17.0.2',undefined]},undefined,undefined,<<0,0,0,0>>}},{yz_pb_admin,no_state},{yz_pb_search,no_state}],{{172,17,0,1},48510},undefined,undefined,3,<<0,0,0,31,25,10,6,98,117,99,107,101,116,18,8,110,97,109,101,95,98,105,110,24,0,34,6,102,111,111,98,97,114,64,0>>,{buffer,[],0,1024}}
** When State == connected
** Last message in was {tcp,#Port<0.414523>,<<0,0,0,31,25,10,6,98,117,99,107,101,116,18,8,110,97,109,101,95,98,105,110,24,0,34,6,102,111,111,98,97,114,64,0>>}
** State machine <0.1806.1> terminating
2016-10-26 02:48:35 =ERROR REPORT====

Additionally I confirmed that I'm running leveldb as I know that is needed for 2i.

Configuration:

storage_backend     leveldb
strong_consistency  off

Thanks in advance for any information you can provide to help lead me in the right direction on this.

lukebakken commented 7 years ago

I can't reproduce this using a from-source build of Riak 2.1.4:

$ erl -pa ./ebin -pa ./deps/*/ebin
Eshell V5.10.3  (abort with ^G)
1> {ok, Pid} = riakc_pb_socket:start_link("127.0.0.1", 8087).
{ok,<0.34.0>}
2> riakc_pb_socket:ping(Pid).
pong
3> riakc_pb_socket:get_index(Pid, <<"bucket">>, {binary_index, "name"}, <<"foobar">>).
{ok,{index_results_v1,[],undefined,undefined}}

I suspect the issue is with the docker image. I'll try that next.

lukebakken commented 7 years ago

I can't reproduce using the provided docker image, either. I'm using the ports as configured on the riak-kv docker hub page. Here's the output:

$ erl -pa ./ebin -pa ./deps/*/ebin
Eshell V5.10.3  (abort with ^G)
1> {ok, Pid} = riakc_pb_socket:start_link("127.0.0.1", 8087).
{ok,<0.34.0>}
2> riakc_pb_socket:ping(Pid).
pong
3> riakc_pb_socket:get_index(Pid, <<"bucket">>, {binary_index, "name"}, <<"foobar">>).
{ok,{index_results_v1,[],undefined,undefined}}

I see you're using OTP 19 for the erlang client, I will try that next.

lukebakken commented 7 years ago

Everything works fine using OTP 19.1, too. Some questions:

mikrofusion commented 7 years ago

Thanks @lukebakken. Updates below.

For more background, I have pushed my code (Dockerfile, Makefile, etc) up here: https://github.com/mikrofusion/elixir_riak

Although it is an elixir project, to remove the elixir variable I have ran been running commands in erl as well and get the same results.

What docker command did you use to pull the riak-kv container?

I used docker-compose so that I can run a Riak cluster locally.

The commands used to start the container are in my Makefile: https://github.com/mikrofusion/elixir_riak/blob/master/Makefile

I am using this command from the Makefile:

test-start:
    docker-compose -f docker-compose.yml -f docker-compose.test.yml -p riaktest up -d 
    sleep 5
    make test-init

test-init:
    docker-compose -p riaktest exec coordinator riak-admin bucket-type create maps '{"props":{"datatype":"map"}}'
    docker-compose -p riaktest exec coordinator riak-admin bucket-type activate maps
    docker-compose -p riaktest exec coordinator riak-admin bucket-type create sets '{"props":{"datatype":"set"}}'
    docker-compose -p riaktest exec coordinator riak-admin bucket-type activate sets
    docker-compose -p riaktest exec coordinator riak-admin bucket-type create counters '{"props":{"datatype":"counter"}}'
    docker-compose -p riaktest exec coordinator riak-admin bucket-type activate counters

The above composes with these two files: https://github.com/mikrofusion/elixir_riak/blob/master/docker-compose.yml https://github.com/mikrofusion/elixir_riak/blob/master/docker-compose.test.yml

What is your exact erl command line?

These were the commands ran in erl, though it fails in my elixir specs as well (which is where I initially found the issue)

Erlang/OTP 19 [erts-8.0.2] [source] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false] [dtrace]

Eshell V8.0.2  (abort with ^G)
1> {ok, Pid} = riakc_pb_socket:start_link("127.0.0.1", 28087).
{ok,<0.59.0>}
2> riakc_pb_socket:ping(Pid).
pong
3> riakc_pb_socket:get_index(Pid, <<"bucket">>, {binary_index, "name"}, <<"foobar">>).
{error,<<"Error processing incoming message: error:{case_clause,\n                                          {rpbindexre"...>>}
4>
=ERROR REPORT==== 25-Oct-2016::19:48:35 ===
** Generic server <0.59.0> terminating
** Last message in was {tcp_closed,#Port<0.673>}
** When Server state == {state,"127.0.0.1",28087,false,false,undefined,false,
                               gen_tcp,undefined,
                               {[],[]},
                               1,[],infinity,undefined,undefined,undefined,
                               undefined,[],100}
** Reason for termination ==
** disconnected
** exception error: disconnected
4>

What changes did you make to the suggested configuration here? Are you using docker-compose?

Yes, using docker-compose. Only change to the config is the riak.conf change to set the backend to leveldb.

How did you switch the configured backend to leveldb?

I'm using the following Dockerfile with docker-compose: https://github.com/mikrofusion/elixir_riak/blob/master/Dockerfile

It is the basho/riak-kv image with the following riak.conf mounted.

Dockerfile:

FROM basho/riak-kv
MAINTAINER Mike Groseclose <mike.groseclose@gmail.com>

ADD riak/riak.conf /etc/riak/riak.conf

Riak.conf: https://github.com/mikrofusion/elixir_riak/blob/master/riak/riak.conf

As far as I am aware, this is the riak.conf that came with basho/riak-kv, the only change being leveldb as the backend.

Can you please run md5sum /usr/lib/riak/lib/basho-patches/* in the container, redirect the output to a file, and attach the output here?

md5sum_info.txt


Thanks again for the support.

mikrofusion commented 7 years ago

Incase it helps here's a list of my local docker images:

docker images                                                                   Wed Oct 26 08:20:31 2016
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
riak-kv-test        latest              02666bf6870a        9 days ago          579 MB
riak-kv             latest              02666bf6870a        9 days ago          579 MB
<none>              <none>              4ed219d7b36b        9 days ago          579 MB
basho/riak-kv       latest              4a4fefa14060        3 weeks ago         579 MB
lukebakken commented 7 years ago

I'm interested in how you start erl. For instance, this is how I start up erl to test the erlang client, after running make in the riak-erlang-client clone dir:

erl -pa ./ebin -pa ./deps/*/ebin
lukebakken commented 7 years ago

@mikrofusion -

The MD5s for that directory don't match mine (attached). You should clear out this container and update to the latest one. @jbrisbin said that there had been issues with .beam versions in older riak-kv container images.

I did not specify a tag when running docker pull so I assume I got the latest version from here:

https://hub.docker.com/r/basho/riak-kv/tags/

$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
basho/riak-kv       latest              24536af1c97c        2 weeks ago         581.7 MB

If updating your environment doesn't resolve your issue, please open an issue in this repository and link back to basho/riak-erlang-client#325 in the new issue's description.

Thanks!

lukebakken commented 7 years ago

If you'd like to double-check your new environment, here are the md5 sums from my container:

basho-patches-md5.txt

mikrofusion commented 7 years ago

@lukebakken I deleted my docker images and pulled latest. It works now. Thank you.

Basho-JIRA commented 7 years ago

Fixed, or closed via GitHub issues.

[posted via JIRA by Alexander Moore]