ankane / blazer

Business intelligence made simple
MIT License
4.5k stars 471 forks source link

Fix encoding error on columns when using Trilogy #466

Open D-system opened 7 months ago

D-system commented 7 months ago

It happens when 2 columns are selected, one of which is a JSON type with non-ASCII characteres (eg: Japanese). Blazer will try render the entire template in UTF-8 and that field in ASCII-8BIT.

Error:

ActionView::Template::Error
incompatible character encodings: UTF-8 and ASCII-8BIT

Tested on MySQL 5.7.

My current workaround that I deployed is:

# config/initializers/blazer_utf8.rb

Blazer::Adapters::SqlAdapter.class_eval do
  alias_method :original_alias_method, :run_statement

  def run_statement(...)
    columns, rows, error = original_alias_method(...)

    if adapter_name == "Trilogy"
      rows.each do |row|
        row.each do |column|
          column.force_encoding(Encoding::UTF_8) if column.is_a?(String)
        end
      end
    end

    [columns, rows, error]
  end
end
ankane commented 6 months ago

Hi @D-system, thanks for the PR. However, I'm not able to reproduce the error (without it).

SELECT CAST('["え"]' AS JSON) AS a, 'test' AS b

Can you share a simple query that causes the error?

D-system commented 6 months ago

Hi @ankane Thank you for the reply and your patience.

I known it was to bit me that I didn't include the code to reproduce the bug. Finally, I was able to reproduce the bug reliably.

Table setup and value:


CREATE TABLE `blazer_466` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `string_value` varchar(255) DEFAULT NULL,
  `json_value` json DEFAULT NULL,

  `created_at` datetime NOT NULL,
  `updated_at` datetime NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5018188 DEFAULT CHARSET=utf8mb4;

INSERT INTO `blazer_466` (`string_value`, `json_value`, `created_at`, `updated_at`)
VALUES ('あ', '\"[{\\\"name\\\":\\\"あ\\\"}]\"', '2024-01-26 10:10:12', '2024-01-26 10:10:12');

Both string_value and json_value needs a non-ascii character. Using a in string_value and no error occurred.


In Blazer (working query):

select id,
-- string_value,
json_value
from blazer_466 
limit 2
SCR-20240301-jnzc

In Blazer (not working query):

select id,
string_value,
json_value
from blazer_466 
limit 2
SCR-20240301-jocj

Stack

   Rendering /Users/thomas/.rbenv/versions/3.2.3/lib/ruby/gems/3.2.0/gems/blazer-3.0.3/app/views/blazer/queries/run.html.erb
   Rendered /Users/thomas/.rbenv/versions/3.2.3/lib/ruby/gems/3.2.0/gems/blazer-3.0.3/app/views/blazer/queries/_caching.html.erb (Duration: 0.6ms | Allocations: 50)
   Rendered /Users/thomas/.rbenv/versions/3.2.3/lib/ruby/gems/3.2.0/gems/blazer-3.0.3/app/views/blazer/queries/run.html.erb (Duration: 4.4ms | Allocations: 1508)
 Completed 500 Internal Server Error in 16ms (ActiveRecord: 5.9ms | SQL Queries: 7 (0 cached) | Allocations: 9655)

 ActionView::Template::Error (incompatible character encodings: UTF-8 and ASCII-8BIT):
     161:                         <% elsif @linked_columns[k] %>
     162:                           <%= link_to blazer_format_value(k, v), @linked_columns[k].gsub("{value}", u(v.to_s)), target: "_blank" %>
     163:                         <% else %>
     164:                           <%= blazer_format_value(k, v) %>
     165:                         <% end %>
     166:                       <% end %>
     167: 

 activesupport (6.1.7.7) lib/active_support/core_ext/string/output_safety.rb:215:in `concat'
 activesupport (6.1.7.7) lib/active_support/core_ext/string/output_safety.rb:215:in `concat'
 actionview (6.1.7.7) lib/action_view/buffers.rb:29:in `<<'
 blazer (3.0.3) app/views/blazer/queries/run.html.erb:164
 blazer (3.0.3) app/views/blazer/queries/run.html.erb:151:in `each'
 blazer (3.0.3) app/views/blazer/queries/run.html.erb:151:in `each_with_index'
 blazer (3.0.3) app/views/blazer/queries/run.html.erb:151
 blazer (3.0.3) app/views/blazer/queries/run.html.erb:149:in `each'
 blazer (3.0.3) app/views/blazer/queries/run.html.erb:149
 actionview (6.1.7.7) lib/action_view/base.rb:247:in `public_send'
 actionview (6.1.7.7) lib/action_view/base.rb:247:in `_run'
 actionview (6.1.7.7) lib/action_view/template.rb:154:in `block in render'
 activesupport (6.1.7.7) lib/active_support/notifications.rb:203:in `block in instrument'
 activesupport (6.1.7.7) lib/active_support/notifications/instrumenter.rb:24:in `instrument'
 activesupport (6.1.7.7) lib/active_support/notifications.rb:203:in `instrument'
 actionview (6.1.7.7) lib/action_view/template.rb:345:in `instrument_render_template'
 actionview (6.1.7.7) lib/action_view/template.rb:152:in `render'
 actionview (6.1.7.7) lib/action_view/renderer/template_renderer.rb:61:in `block (2 levels) in render_template'
 activesupport (6.1.7.7) lib/active_support/notifications.rb:203:in `block in instrument'
D-system commented 6 months ago

string_value with a: no error

SCR-20240301-jpsc
D-system commented 6 months ago

Regarding low details implementation, for ActiveRecord::Base.connection.execute(sql).entries:

I'm on Rails 6.1, so I use the activerecord-trilogy-adapter gem

ankane commented 6 months ago

Thanks @D-system, was able to reproduce it with that example.

Spent some time on this today, but don't have a great solution. Converting all strings to UTF-8 will fail with invalid byte sequence in UTF-8 for binary data.

SELECT b'11111111';

We could convert strings with Encoding::ASCII_8BIT to hex ("0x#{v.unpack1("H*").upcase}"), but this will obscure the JSON value. Also, ActiveRecord::Result doesn't get the column types for MySQL, so it's not possible to use that.

D-system commented 6 months ago

I reproduced the binary issue.

SCR-20240319-sqxa

Even the column_type is empty. I'm a bit puzzled on how to fix it in a correct way. In #468, it is mentioned that the issue still exists even with mysql2. If feels that the error might not be entirely related to trilogy.

ankane commented 6 months ago

Pushed a fix for the error in the commit above, but don't have a good way to show the JSON value.

Edit: One way would be to manually cast in queries - CAST(json_value AS CHAR)

D-system commented 5 months ago

I feel that json fields are more used then binary fields. Json is readable by default, binary is not. If there's a manual effort to do, personally I would put the effort on the non-readable fields (the binary).

ankane commented 5 months ago

That would be ideal, but don't see a good way to do that and fix the errors. May revert and leave things as is (since it only affects some queries).