Open s137 opened 2 years ago
I'm pretty sure, this PullRequest might have been the breaking change here: https://github.com/elastic/logstash/pull/13523
There are also several other people apparently experiencing this or at least a similar issue:
Just tested and confirmed that this issue still exists in Logstash 8.5.0.
Tested in logstash 8.5.1 on Windows 10, with a Filebeat(filestream)-->Logstash-->ElasticSearch pipeline. It seems unrelated to SQL Server or jdbc.
Filebeat config:
filebeat.inputs:
- type: filestream
id: my-filestream-id
enabled: true
paths:
- ..\example.log
encoding: utf-8
Logstash config:
input {
beats {
port => 5044
}
}
output {
elasticsearch {
hosts => "https://localhost:9200"
user => elastic
password => "..."
ssl_certificate_verification => false
}
file {
path => "out.txt"
}
}
The file example.log is UTF-8 encoded (confirmed using an hex editor).
If I run (from a PowerShell console inside VS Code) logstash.bat
, the output "out.txt" is written with UTF-8 encoding, but the POST to the ElasticSearch _bulk endpoint is encoded with Windows-1252 (checked with Telerik Fiddler's HexView) (Windows-1252 is my system's encoding, checked with [System.Text.Encoding]::Default
), and ElasticSearch returns an "Invalid UTF-8 start byte" error.
If I run (from the same console) logstash > log.txt
, the POST to ElasticSearch is encoded with UTF-8 and accepted by ElasticSearch.
Hi! We are experience the same bug with Logstash 8.5.2 running with the logstash-jdbc-input plugin. Downgrading to 8.3.3 fixed the issue without changing the configuration.
Any idea when this issue might be resolved? We stuck on Logstash 8.3.x until it is I think.
This is still occurring on logstash 8.5.1. Tested on windows server 2019 with logstash 8.5.1, mssql-jdbc-12.2.0.jre11.jar driver and java 17 that ships with logstash 8.5.1. Any updates on this issue ?
This is the error in logstash logs, when running the jdbc logstash input.
An unknown error occurred sending a bulk request to Elasticsearch (will retry indefinitely) {:message=>"incompatible encodings: CP850 and UTF-8", :exception=>Encoding::CompatibilityError
I recently upgraded to 8.6.1 and started facing the very same issue described in this thread.
An unknown error occurred sending a bulk request to Elasticsearch (will retry indefinitely) {:message=>"incompatible encodings: IBM437 and UTF-8", :exception=>Encoding::CompatibilityError, :backtrace=>["org/jruby/ext/stringio/StringIO.java:1162:in
write'", "C:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch/http_client.rb:142:in block in bulk'", "org/jruby/RubyArray.java:1865:in
each'", "org/jruby/RubyEnumerable.java:1143:in each_with_index'", "C:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch/http_client.rb:125:in
bulk'", "C:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:296:in safe_bulk'", "C:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:228:in
submit'", "C:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:177:in retrying_submit'", "C:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch.rb:342:in
multi_receive'", "org/logstash/config/ir/compiler/AbstractOutputDelegatorExt.java:121:in multi_receive'", "C:/logstash/logstash-core/lib/logstash/java_pipeline.rb:300:in
block in start_workers'"]}`
Hi - any update on when this bug is going to be patched. I see that Elastic 8.7 has been released recently but unless this bug is patched I'll need to keep our servers on the older Logstash 8.3.x version.
thanks, Steve
Any updates on this? This bug is preventing me and other users from updating to newer versions of logstash and therefore Elasticsearch until it is resolved. The Pullrequest #13523 most likely caused this to not work anymore. @andsel Maybe you can take a look at this one? I would really appreciate it. Thanks in advance.
Any updates on this? I'd really appreciate if you could maybe take a quick look at this. @andsel Thanks in advance! I'm also happy to provide more information if neccessary.
Still hoping for an update on this issue, which as far as I'm aware has never been resolved. I (and probably a bunch of others) are still stuck on Logstash 8.3.3 and not able to upgrade to anything more recent.
Thanks in advance...
The error is coming from logstash-output-elasticsearch-v11.6.0
when creating a body stream, stream_writer.write(as_json)
to send to Elasticsearch. Elasticsearch always expects the payload(s) as an UTF-8 format and in this case as_json
payload is non-UTF-8.
We have recently improved handling invalid UTF-8 cases. Any invalid UTF-8 bytes will be replaced by a replacement char \uFFFD
.
The change is included in v11.22.3
plugin version or it is default in Logstash 8.13+ versions.
The plugin can be installed without upgrading Logstash core with bin/logstash-plugin update logstash-output-elasticsearch
command.
P.S: I haven't test it but if it brings any Logstash core API compatibility issue, you may need to upgrade the Logstash core.
Please try and let us know.
Hi @mashhurs ,
I can confirm that this issue appears to be resolved in Logstash 8.13.1. Thanks very much for the update, and apologies for the delay getting back to you.
Logstash information:
Plugins installed: no extra plugins were installed
JVM (e.g.
java -version
): Bundled JDK: openjdk 17.0.4 2022-07-19 OpenJDK Runtime Environment Temurin-17.0.4+8 (build 17.0.4+8) OpenJDK 64-Bit Server VM Temurin-17.0.4+8 (build 17.0.4+8, mixed mode, sharing) -> but also tested with: openjdk 11.0.15 2022-04-19 OpenJDK Runtime Environment Temurin-11.0.15+10 (build 11.0.15+10) OpenJDK 64-Bit Server VM Temurin-11.0.15+10 (build 11.0.15+10, mixed mode)OS version: Windows 10
Description of the problem including expected versus actual behavior:
If I query some some NVARCHAR-Fields from a Microsoft-SQL-Server (which in MSSQL-Server are always encoded in UTF-16) via logstash-jdbc-input plugin without specifying any special encoding or charset settings, neither in the input nor in the output logstash plugins, logstash failes to transfer the events to ElasticSearch by throwing this error over and over again for every document:
This worked fine up until Version 8.3.3 of Logstash, since Version 8.4.0 it doesn't work anymore. I also tried specifying the Encoding as UTF-16 with jdbc-input-plugins columns_charset option, but this doesn't affect the behaviour of logstash at all.
Steps to reproduce:
(Oddly enough if you run the logstash.bat from the commandline and redirect stdout (and/or) stderr to a file, it works perfectly without any errors and indexes everything as it should. I have no idea how it is possible though, that output redirection affects the behaviour of logstash here, to be honest it just makes no sense.)