flori / json

JSON implementation for Ruby
https://flori.github.io/json
Other
668 stars 322 forks source link

Loading UTF-8 file throws Encoding::CompatibilityError #232

Open HiroakiMachida opened 9 years ago

HiroakiMachida commented 9 years ago

copying from Stackoverflow. http://stackoverflow.com/questions/27673655/json-load-throws-encodingcompatibilityerror

Loading UTF-8 file.

[ec2-user@ip-XXX-XXX-XXX-XXX vfs]$ file data/E03124/data.json 
data/E03124/data.json: UTF-8 Unicode text, with very long lines, with no line terminators

Error message

Caught Encoding::CompatibilityError at '"{\"資産の部\":{': incompatible encoding regexp match (ASCII-8BIT regexp with UTF-8 string)

Backtrace

json (1.8.1) lib/json/pure/parser.rb:242:in `rescue in parse_string'
json (1.8.1) lib/json/pure/parser.rb:213:in `parse_string'
json (1.8.1) lib/json/pure/parser.rb:257:in `parse_value'
json (1.8.1) lib/json/pure/parser.rb:121:in `parse'
json (1.8.1) lib/json/common.rb:155:in `parse'
json (1.8.1) lib/json/common.rb:334:in `load'
app/controllers/statements_controller.rb:13:in `block in getData'
app/controllers/statements_controller.rb:12:in `open'
app/controllers/statements_controller.rb:12:in `getData'

Rails code

def getData
  json_data = open("data/#{params[:code]}/data.json") do |io|
    JSON.load(io)
  end
  render :json => json_data
end

Ruby version is 2.0.0.

Rails version is 4.1.4.

And the problem is the json/pure parser according to AJcodez.

The regex for matching a string uses the n option meaning the pattern is in ASCII-8BIT encoding. From the ruby regex docs:

A regexp can be matched against a string when they either share an encoding, or the regexp’s encoding is US-ASCII and the string’s encoding is ASCII-compatible.

If a match between incompatible encodings is attempted an Encoding::CompatibilityError exception is raised.

json/pure/parser.rb

    215    string = self[1].gsub(%r((?:\\[\\bfnrt"/]|(?:\\u(?:[A-Fa-f\d]{4}))+|\\[\x20-\xff]))n) do |c|
samuraraujo commented 9 years ago

Try to use MultiJson