Closed lucasallan closed 13 years ago
can you paste the original input string in here after calling .inspect
on it so all the binary data gets escaped?
The original string is "Residencial Gaudí",
ruby-1.9.2-p180 :001 > s = "Residencial Gaudí" => "Residencial Gaudí" ruby-1.9.2-p180 :003 > s.inspect => "\"Residencial Gaudí\""
I don't see an accent character in there? Are you sure that's the exact original string?
The string is "Residencial Gaudí",
the last letter has an accent " í "
But if I use any other letter with accent like '^ ~ ' ' or letters like 'ç' the same problem happens.
Could you try calling .bytes.to_a
on the string and paste me the output?
ree-1.8.7-2011.03 :002 > "Residencial Gaudí".bytes.to_a => [82, 101, 115, 105, 100, 101, 110, 99, 105, 97, 108, 32, 71, 97, 117, 100, 195, 173]
and
ree-1.8.7-2011.03 :003 > "caça praça êé".bytes.to_a => [99, 97, 195, 167, 97, 32, 112, 114, 97, 195, 167, 97, 32, 195, 170, 195, 169]
That string coming in a json from http request. I have a rails controller and a android app send a post with json and that error happens.
I'm able to parse the strings without error:
ree-1.8.7-2011.03 :010 > require 'yajl'
=> false
ree-1.8.7-2011.03 :011 > str = "\"#{[82, 101, 115, 105, 100, 101, 110, 99, 105, 97, 108, 32, 71, 97, 117, 100, 195, 173].map{|c| c.chr}.join}\""
=> "\"Residencial Gaud\303\255\""
ree-1.8.7-2011.03 :012 > puts str
"Residencial Gaudí"
=> nil
ree-1.8.7-2011.03 :013 > Yajl.load str
=> "Residencial Gaud\303\255"
ree-1.8.7-2011.03 :014 > puts Yajl.load str
Residencial Gaudí
=> nil
ree-1.8.7-2011.03 :015 > str2 = "\"#{[99, 97, 195, 167, 97, 32, 112, 114, 97, 195, 167, 97, 32, 195, 170, 195, 169].map{|c| c.chr}.join}\""
=> "\"ca\303\247a pra\303\247a \303\252\303\251\""
ree-1.8.7-2011.03 :016 > puts str2
"caça praça êé"
=> nil
ree-1.8.7-2011.03 :017 > Yajl.load str2
=> "ca\303\247a pra\303\247a \303\252\303\251"
ree-1.8.7-2011.03 :018 > puts Yajl.load str2
caça praça êé
=> nil
Can you paste the bytes for the original (entire) JSON string itself, not just the part where the error was?
I'm thinking this is a problem in rails, because the problem only happens when the string is sent as a parameter in request. It is very hard debugger, because the exception happens before entering the create method
Started POST "/locations" for 10.0.0.2 at Wed Jun 08 13:12:04 -0300 2011 Error occurred while parsing request parameters. Contents:
Yajl::ParseError (lexical error: invalid bytes in UTF8 string. aa","latitude":-7,"name":"caça"}} (right here) ------^ ):
I have a similar problem:
ruby-1.9.2-p180 :002 > Yajl::Parser.parse '{"żółty": "foo"}', :symbolize_keys => true
EncodingError: invalid encoding symbol
from /home/stan/.rvm/gems/ruby-1.9.2-p180/gems/yajl-ruby-0.8.2/lib/yajl.rb:37:in `parse'
from /home/stan/.rvm/gems/ruby-1.9.2-p180/gems/yajl-ruby-0.8.2/lib/yajl.rb:37:in `parse'
from (irb):4
from /home/stan/.rvm/rubies/ruby-1.9.2-p180/bin/irb:16:in `<main>'
ruby-1.9.2-p180 :003 > "żółty".bytes.to_a
=> [197, 188, 195, 179, 197, 130, 116, 121]
I'm seeing a similar error on rails 3.1.0.rc5/yajl 0.8.2/ ruby 1.9.2p180.
I'm posting a JSON body with UTF8 characters.
JSON::ParserError (lexical error: invalid bytes in UTF8 string.
of French writers such as St?phane Mallarm? and Joseph Joube
(right here) ------^
):
translations of French writers such as St\xE9phane Mallarm\xE9 and Joseph Joubert.
I've got a similar issue with yajl-ruby 0.8.3
At least part of the problems should be solved with 0.8.3, see https://github.com/brianmario/yajl-ruby/pull/71
@tc that string looks to be in the ISO-8859-1
encoding and JSON requires it to be in UTF-8. Can you transcode it into UTF-8 before handing it to yajl-ruby?
A quick way to check if a string is valid UTF-8 in 1.9.2 is to do this:
"some string".force_encoding('UTF-8').valid_encoding?
@larsgt - what is the string you're having trouble with?
Pythons json tools also barfs on this string. I ended up cleaning up our database. 226 was the code of the bad character. There is what we ran to fix the string: [66, 97, 114, 226, 109, 44].pack("U*")
closing this for now. basically the input must be valid utf-8 in order for yajl-ruby to be able to parse it correctly
Running into the same issue.. except my data source is github's timeline.. :-)
shas":[["652951d929f014eeaa6f3f01f5106d40ad97ea41","lukasz.milewski@gmail.com","Added JSON support","?ukasz Milewski",true]]
Results in:
Processing exception: lexical error: invalid bytes in UTF8 string.
l.com","Added JSON support","?ukasz Milewski",true]],"ref":"
(right here) ------^
1) Suggestions for how to deal with this, short of dumping the entire input stream? 2) Looks like an encoding bug on github? /cc @tmm1
Commit event in above bug: https://github.com/IGED-UFPB/IGED/compare/96883dfb92...b5b6835788
In fact, fetching events from that repo shows plenty of same problems: https://api.github.com/repos/IGED-UFPB/IGED/events, ex:
message: "Corre??o na compara??o de duas Lias pela camada de Abstra??o."
damn, I REALLY need to try to finish up 2.0 - unfortunately there isn't much I can do since yajl 1.x doesn't do any Unicode validation at all (and that's what we're using). we use charlock_holes to try and guess and transcode stuff into UTF-8 before encoding but sometimes there isn't enough data to make an accurate detection. Anyway this is definitely an issue we (GitHub) needs to deal with. Would you mind hitting up support@github and mention me? I'll do what needs to happen ;)
On Mar 11, 2012, at 9:24 AM, Ilya Grigorikreply@reply.github.com wrote:
Running into the same issue.. except my data source is github's timeline.. :-)
shas":[["652951d929f014eeaa6f3f01f5106d40ad97ea41","lukasz.milewski@gmail.com","Added JSON support","?ukasz Milewski",true]]
Results in:
Processing exception: lexical error: invalid bytes in UTF8 string. l.com","Added JSON support","?ukasz Milewski",true]],"ref":" (right here) ------^
1) Suggestions for how to deal with this, short of dumping the entire input stream? 2) Looks like an encoding bug on github? /cc @tmm1
Reply to this email directly or view it on GitHub: https://github.com/brianmario/yajl-ruby/issues/64#issuecomment-4440034
Fired off an email to support - thanks Brian!
Just now running into the same issue. @igrigorik - I'm trying to use yajl-ruby to parse through your archive events too. :) What was the resolution?
@kitplummer that's odd.. I'm serializing it into those archives with Yajl - it should have bombed one step before on my end. If the archive is \n delimited (depends on the date range), you can do the "read one line, parse one line" trick.. and rescue the exception and skip.
Yajl don't accepted accented characters, any idea?
Yajl::ParseError (lexical error: invalid bytes in UTF8 string. 724,"name":"Residencial Gaudí"}} (right here) ------^ ):