brianmario / charlock_holmes

Character encoding detection, brought to you by ICU
MIT License
1.04k stars 141 forks source link

return `ruby_encoding` & `encoding` fields for binary data #113

Closed mhasbini closed 2 months ago

mhasbini commented 7 years ago

Currently detect doesn't return :encoding & :ruby_encoding fields when data type is binary.

Current version:

> CharlockHolmes::EncodingDetector.detect("\0")
=> {:type=>:binary, :confidence=>100}
> CharlockHolmes::EncodingDetector.detect("a")
=> {:type=>:text, :encoding=>"UTF-8", :ruby_encoding=>"UTF-8", :confidence=>15}

Updated:

> CharlockHolmes::EncodingDetector.detect("\0")
{:type=>:binary, :encoding=>"BINARY", :ruby_encoding=>"ASCII-8BIT", :confidence=>100}

BINARY & ASCII-8BIT are aliases.