basho / riak

Riak is a decentralized datastore from Basho Technologies.
http://docs.basho.com
Apache License 2.0
3.94k stars 536 forks source link

HTTP Bucket list returns 500 error: bad_utf8_character_code [JIRA: RIAK-2205] #415

Open TJC opened 10 years ago

TJC commented 10 years ago

If I run curl -i "http://bld-riak/buckets?buckets=true" I receive this error about a bad utf8 character code, and nothing else. (That hostname runs haproxy on port 80. The same error occurs if I go direct to a single node on port 9098)

This is Riak version 1.4.2-1 (from the official apt repo)

HTTP/1.1 500 Internal Server Error
Vary: Accept-Encoding
Server: MochiWeb/1.1 WebMachine/1.10.0 (never breaks eye contact)
Date: Thu, 17 Oct 2013 00:51:03 GMT
Content-Type: text/html
Content-Length: 1103

<html><head><title>500 Internal Server Error</title></head><body><h1>Internal Server Error</h1>The server encountered an error while processing this request:<br><pre>{error,{exit,{ucs,{bad_utf8_character_code}},
             [{xmerl_ucs,from_utf8,1,[{file,"xmerl_ucs.erl"},{line,185}]},
              {mochijson2,json_encode_string,2,
                          [{file,"src/mochijson2.erl"},{line,186}]},
              {mochijson2,'-json_encode_array/2-fun-0-',3,
                          [{file,"src/mochijson2.erl"},{line,157}]},
              {lists,foldl,3,[{file,"lists.erl"},{line,1197}]},
              {mochijson2,json_encode_array,2,
                          [{file,"src/mochijson2.erl"},{line,159}]},
              {mochijson2,'-json_encode_proplist/2-fun-0-',3,
                          [{file,"src/mochijson2.erl"},{line,167}]},
              {lists,foldl,3,[{file,"lists.erl"},{line,1197}]},
              {mochijson2,json_encode_proplist,2,

[{file,"src/mochijson2.erl"},{line,170}]}]}}</pre><P><HR><ADDRESS>mochiweb+webmachine web server</ADDRESS></body></html> 
roncemer commented 10 years ago

This same problem also occurs if you use the new Riak counters functionality. Set up a bucket with allow_mult=true. Increment some counters in that bucket. Then try to do a map/reduce query on them. You get this error.

If you eliminate the JavaScript map and reduce functions, you can get the results, but the results are just bucket,key pairs without the actual counter values. Useless.

This bug needs to be fixed, like NOW, and a new release issued. It's a major show-stopper for counters.

No storage engine should be non-binary safe for both keys and values. Riak is no exception.

People are getting sick of Riak's second-rate JavaScript support. JavaScript should be the PRIMARY language for map/reduce, and should be easier to use and more efficient than erlang. Erlang is an obscure language which nobody wants to learn.

Let's get this fixed right away, so people can use counters in Riak.

Micka33 commented 10 years ago

I have enabled Riak Search Config, after some fail using the search function, I wanted to list all my buckets to be sure I was requesting the good one. This is what I got

irb(main):004:0> client = Riak::Client.new :solr => "/solr"
=> #<Riak::Client [#<Node 127.0.0.1:8098:8087>]>
irb(main):005:0> client.search "user" "email:micka3@email.com"
=> {"num_found"=>0, "max_score"=>0.0, "docs"=>[]}
irb(main):006:0> client.search "user" "username:micka3"
=> {"num_found"=>0, "max_score"=>0.0, "docs"=>[]}
....
irb(main):010:0> client.buckets
Riak::Client#buckets is an expensive operation that should not be used in production.
    (irb):10:in `irb_binding'
    /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/1.9.1/irb/workspace.rb:80:in `eval'
    /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/1.9.1/irb/workspace.rb:80:in `evaluate'
    /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/1.9.1/irb/context.rb:254:in `evaluate'
    /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/1.9.1/irb.rb:159:in `block (2 levels) in eval_input'
    /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/1.9.1/irb.rb:273:in `signal_status'
    /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/1.9.1/irb.rb:156:in `block in eval_input'
    /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/1.9.1/irb/ruby-lex.rb:243:in `block (2 levels) in each_top_level_statement'
    /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/1.9.1/irb/ruby-lex.rb:229:in `loop'
    /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/1.9.1/irb/ruby-lex.rb:229:in `block in each_top_level_statement'
    /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/1.9.1/irb/ruby-lex.rb:228:in `catch'
    /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/1.9.1/irb/ruby-lex.rb:228:in `each_top_level_statement'
    /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/1.9.1/irb.rb:155:in `eval_input'
    /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/1.9.1/irb.rb:70:in `block in start'
    /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/1.9.1/irb.rb:69:in `catch'
    /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/1.9.1/irb.rb:69:in `start'
    /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/gems/1.9.1/gems/railties-3.2.14/lib/rails/commands/console.rb:47:in `start'
    /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/gems/1.9.1/gems/railties-3.2.14/lib/rails/commands/console.rb:8:in `start'
    /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/gems/1.9.1/gems/railties-3.2.14/lib/rails/commands.rb:41:in `<top (required)>'
    script/rails:6:in `require'
    script/rails:6:in `<main>'
Riak::HTTPFailedRequest: Expected 200 from Riak but received 500. <html><head><title>500 Internal Server Error</title></head><body><h1>Internal Server Error</h1>The server encountered an error while processing this request:<br><pre>{error,{exit,{ucs,{bad_utf8_character_code}},
             [{xmerl_ucs,from_utf8,1,[{file,"xmerl_ucs.erl"},{line,185}]},
              {mochijson2,json_encode_string,2,
                          [{file,"src/mochijson2.erl"},{line,186}]},
              {mochijson2,'-json_encode_array/2-fun-0-',3,
                          [{file,"src/mochijson2.erl"},{line,157}]},
              {lists,foldl,3,[{file,"lists.erl"},{line,1197}]},
              {mochijson2,json_encode_array,2,
                          [{file,"src/mochijson2.erl"},{line,159}]},
              {mochijson2,'-json_encode_proplist/2-fun-0-',3,
                          [{file,"src/mochijson2.erl"},{line,167}]},
              {lists,foldl,3,[{file,"lists.erl"},{line,1197}]},
              {mochijson2,json_encode_proplist,2,
                          [{file,"src/mochijson2.erl"},{line,170}]}]}}</pre><P><HR><ADDRESS>mochiweb+webmachine web server</ADDRESS></body></html>
    from /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/gems/1.9.1/gems/riak-client-1.4.2/lib/riak/client/net_http_backend.rb:58:in `block (2 levels) in perform'
    from /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/1.9.1/net/http.rb:1323:in `block (2 levels) in transport_request'
    from /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/1.9.1/net/http.rb:2672:in `reading_body'
    from /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/1.9.1/net/http.rb:1322:in `block in transport_request'
    from /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/1.9.1/net/http.rb:1317:in `catch'
    from /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/1.9.1/net/http.rb:1317:in `transport_request'
    from /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/1.9.1/net/http.rb:1294:in `request'
    from /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/gems/1.9.1/gems/riak-client-1.4.2/lib/riak/client/net_http_backend.rb:56:in `block in perform'
    from /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/gems/1.9.1/gems/riak-client-1.4.2/lib/riak/client/net_http_backend.rb:54:in `tap'
    from /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/gems/1.9.1/gems/riak-client-1.4.2/lib/riak/client/net_http_backend.rb:54:in `perform'
    from /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/gems/1.9.1/gems/riak-client-1.4.2/lib/riak/client/http_backend/transport_methods.rb:44:in `get'
    from /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/gems/1.9.1/gems/riak-client-1.4.2/lib/riak/client/http_backend.rb:213:in `list_buckets'
    from /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/gems/1.9.1/gems/riak-client-1.4.2/lib/riak/client.rb:179:in `block in buckets'
    from /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/gems/1.9.1/gems/riak-client-1.4.2/lib/riak/client.rb:470:in `block in recover_from'
    from /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/gems/1.9.1/gems/innertube-1.0.2/lib/innertube.rb:127:in `take'
    from /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/gems/1.9.1/gems/riak-client-1.4.2/lib/riak/client.rb:468:in `recover_from'
    from /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/gems/1.9.1/gems/riak-client-1.4.2/lib/riak/client.rb:321:in `http'
    from /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/gems/1.9.1/gems/riak-client-1.4.2/lib/riak/client.rb:138:in `backend'
    from /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/gems/1.9.1/gems/riak-client-1.4.2/lib/riak/client.rb:178:in `buckets'
    from (irb):10
    from /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/gems/1.9.1/gems/railties-3.2.14/lib/rails/commands/console.rb:47:in `start'
    from /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/gems/1.9.1/gems/railties-3.2.14/lib/rails/commands/console.rb:8:in `start'
    from /usr/local/Cellar/ruby193/1.9.3-p448/lib/ruby/gems/1.9.1/gems/railties-3.2.14/lib/rails/commands.rb:41:in `<top (required)>'
    from script/rails:6:in `require'
    from script/rails:6:in `<main>'irb(main):011:0> 
russelldb commented 10 years ago

@roncemer is this any use to you https://github.com/basho/riak_crdt_cookbook/blob/master/counters/README.md?

roncemer commented 10 years ago

Actually, it turns out that Riak counters seem to suffer from the same problems as other vector clock-based counter implementations. Simultaneous increments through different nodes on the same counter result in under-counts or over-counts (under-counts in the case that I've tested).

Start up a 6-node Riak cluster, set allow_mult=true on a bucket. Pick a new counter key which doesn't exist yet, and write a multi-threaded or multi-process CLI app to connect to a random node in the cluster and increment the counter for that key 100 times on each node. Do it in, say, 50 threads or 50 processes. So each thread or process connects to a random Riak node and increments the same key 100 times. 100 x 50 = 5,000. All of the threads/processes start at the same time and run until they've each done 100 increments on the key. After they finish, wait a while for the dust to settle while the eventual consistency to kicks in. The final count will not be 5,000.

Actually, the way counters are implemented is unreliable by nature. Seems like a better approach would be for each node to keep track of how many times a given key was incremented ON THAT NODE ONLY. Then, the total count would be the sum of the counts for all nodes in the cluster. Regularly, the nodes would share with each other what their per-node counts are for each recently incremented counter. Once a node's per-node count has been broadcast to other nodes, it can zero that count back out and start accumulating it again. Whichever node is primarily responsible for the vnode on which the counter lives, would be the responsible for regularly fetching and summing the per-node counts. You'd never lose or duplicate an increment that way, unless a node went down before that increment got copied to the other replicas for that key.

But I digress.

The issue which appears to be causing the bad_utf8_character_code error must have something to do with either Erlang's inability to manage binary strings, or perhaps the Riak devs are accidentally treating binary data such as that which is stored for counters, as UTF8 strings. That would be a big mistake. Since Riak does not dictate that the data for a key must be in a particular format, it doesn't make sense for Riak's mapreduce to throw errors when it encounters a key which contains binary data.

In reality, it Riak would have been a much better product had they avoided Erlang altogether, given first-rate support for JavaScript mapreduce and queries the way Mongo does, and stored buckets in separate directories in order to make it quick and easy to drop a bucket. Nearly everything about Riak's design makes it much more painful to use than it should be. The big benefits with Riak are the ring topology, no single point of failure, and automatic/tuneable replication.

russelldb commented 10 years ago

@roncemer We should talk about the incorrectness you're seeing in counters, if you could provide anything that helps reproduce the bug I'd appreciate it. We tested counters pretty extensively, and apart from their non-idempotence in the face of partial failures, we're confident that they're accurate under heavy concurrent use. We never lose an increment we say has been written. We only ever possibly duplicate an increment if we say we failed, but the failure was partial and client retried. If you can demonstrate this to be untrue, I'd be very grateful.

The MR JS issues is because we chose an efficient binary format for counters that JS cannot work with. I chose to sacrifce JS MR support in favour of a better binary format. Computering is trade-offs, and I thought that one acceptable.

TJC commented 10 years ago

In the case of my original error -- the cause was Erlang code creating buckets and keys with binary characters in them, that can not be converted to UTF8 in any valid way.

I later discovered it's also a problem with the PBC interface in Java, as well as the HTTP interface in any language.

russelldb commented 10 years ago

@TJC is it OK to close this issue then? Did you open an issue against the Java Client?

@roncemer Please can you open an issue against riak_kv with details of the counter issue you've experienced? Please provide steps to reproduce as running multiple concurrent workers against riak counters is something we've done many times in testing and we do not see the behaviour you describe.

TJC commented 10 years ago

@russelldb --- no, it is not OK to close this issue, because the issue is present on BOTH the java client AND the HTTP interface. They're two separate bugs. The HTTP interface throws a 500 Error regardless of client for this bug.

russelldb commented 10 years ago

@TJC ah, OK. Did you raise an issue against the Java Client that I can cross-ref here, then?

russelldb commented 10 years ago

@roncemer I'm loathe to continue this discussion on this ticket, but I re-ran all our counter tests this morning, including writing a simple reproduction as you described above (see https://gist.github.com/russelldb/a83f5cd5e430c20cc33a) and couldn't provoke either under or overcounting.

TJC commented 10 years ago

@russelldb - There was some discussion on the mailing list around the java client issue, and apparently the issue is already fixed in the 2.0 branch, so I didn't create an issue on github for it. I am happy to do so if that helps though?

russelldb commented 10 years ago

@TJC Narp, no need if it is fixed. Thanks!

jaredmorrow commented 10 years ago

@TJC is this now fixed in the Java client and can be closed here? @seancribbs opinions on it also being a bug in the HTTP client?

TJC commented 10 years ago

Re bug in HTTP API -- being unable to list your buckets surely counts as a bug? (Even if being unable to list or interact with keys isn't)

jrwest commented 10 years ago

not 100% but this may be related re: HTTP and UTF8 https://github.com/basho/riak-erlang-http-client/pull/44 cc/ @macintux

TJC commented 10 years ago

Jared - I haven't tested binary handling with the RJC 2.0 client yet, but the use of "BinaryValue" types to represent buckets and keys looks promising. I'll try and get time to setup a cluster in VMs and try it out later today.

kuenishi commented 10 years ago

Listing non-valid-utf8-named buckets or non-valid-utf8-named keys always fails because there is no way to represent them in JSON String. It's possible to encode the stuff in some way (say, base64 or whatever) but it's not right. This is why HTTP requests always fail. So I don't think this is a bug.

RJC PB interface should support that use case non-valid utf8 Strings - in RJC 2.0 but RJC 1.x seems to have only String keys ? Oh, Strings in Java are also for valid unicode only.

c.f. https://github.com/basho/riak_kv/issues/468

binarytemple commented 8 years ago

So the solution, in the presence of non-valid-utf8-named buckets or non-valid-utf8-named keys is to use the protocol buffers interface - as they can't be represented in valid JSON. This sounds reasonable to me. Javascript Map/Reduce is also deprecated. Can this issue be closed now?

binarytemple commented 8 years ago

Interestingly, the python client doesn't seem able to list them either.

c=riak.RiakClient(protocol='pbc',nodes=[{'host':'127.0.0.1', 'http_port':'10018','pb_port':10017}]) c.get_buckets()

In [29]: b=c.get_buckets()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-29-5860d35cae91> in <module>()
----> 1 b=c.get_buckets()

/usr/local/lib/python2.7/site-packages/riak/client/transport.pyc in wrapper(self, *args, **kwargs)
    194             return fn(self, transport, *args, **kwargs)
    195
--> 196         return self._with_retries(pool, thunk)
    197
    198     wrapper.__doc__ = fn.__doc__

/usr/local/lib/python2.7/site-packages/riak/client/transport.pyc in _with_retries(self, pool, fn)
    136                 with pool.transaction(_filter=_skip_bad_nodes) as transport:
    137                     try:
---> 63             bucketfn = lambda name: self.bucket(name)                                                      [0/1648]
     64
     65         return [bucketfn(bytes_to_str(name)) for name in

/usr/local/lib/python2.7/site-packages/riak/client/__init__.pyc in bucket(self, name, bucket_type)
    269
    270         return self._buckets.setdefault((bucket_type, name),
--> 271                                         RiakBucket(self, name, bucket_type))
    272
    273     def bucket_type(self, name):

/usr/local/lib/python2.7/site-packages/riak/bucket.pyc in __init__(self, client, name, bucket_type)
     59                     raise TypeError('Bucket name must be a string')
     60             except UnicodeError:
---> 61                 raise TypeError('Unicode bucket names are not supported.')
     62
     63         if not isinstance(bucket_type, BucketType):

TypeError: Unicode bucket names are not supported.

In [30]: c=riak.RiakClient(protocol='pbc',nodes=[{'host':'127.0.0.1', 'http_port':'10018','pb_port':10017}])
KeyboardInterrupt

It might be worth raising an issue against the Python client if unicode bucket names are indeed supported. Thoughts?

TJC commented 8 years ago

@binarytemple, you wrote "So the solution, in the presence of non-valid-utf8-named buckets or non-valid-utf8-named keys is to use the protocol buffers interface - as they can't be represented in valid JSON. This sounds reasonable to me."

In the two years since I created this ticket, I started using the PBC interface via the RJC via Scala, and can confirm that it does work with the non-unicode-buckets.

If you have multiple applications at play, some using PBC and some using HTTP, then a mistake in the PBC code (inserting non-unicode keys or buckets) will then cause the HTTP-based apps to suddenly completely fail. I see that sort of surprise failure as very undesirable, as it's the sort that could reasonably make it through testing and into production, and then bring your whole system down. (It didn't happen to us, but I could see how it could)

I think you have a choice between which of these bugs you consider a bug: a) HTTP client cannot access non-unicode-safe keys and buckets b) PBC client is allowed to create non-unicode-safe keys and buckets c) Riak server allows non-unicode-safe keys and buckets to be created

mengzyou commented 7 years ago

How about this issue? The HTTP API list buckets and list keys still response 500 error to version 2.2.3:

<html><head><title>500 Internal Server Error</title></head><body><h1>Internal Server Error</h1>The server encountered an error while processing this request:<br><pre>{error,{exit,{ucs,{bad_utf8_character_code}},
             [{xmerl_ucs,from_utf8,1,[{file,"xmerl_ucs.erl"},{line,185}]},
              {mochijson2,json_encode_string,2,
                          [{file,"src/mochijson2.erl"},{line,200}]},
              {mochijson2,'-json_encode_array/2-fun-0-',3,
                          [{file,"src/mochijson2.erl"},{line,171}]},
              {lists,foldl,3,[{file,"lists.erl"},{line,1248}]},
              {mochijson2,json_encode_array,2,
                          [{file,"src/mochijson2.erl"},{line,173}]},
              {mochijson2,'-json_encode_proplist/2-fun-0-',3,
                          [{file,"src/mochijson2.erl"},{line,181}]},
              {lists,foldl,3,[{file,"lists.erl"},{line,1248}]},
              {mochijson2,json_encode_proplist,2,
                          [{file,"src/mochijson2.erl"},{line,184}]}]}}</pre><P><HR><ADDRESS>mochiweb+webmachine web server</ADDRESS></body></html>