Open randude opened 7 years ago
@randude thanks, looks like we got this reproduced at last then :) Will try to get to this today or tomorrow :)
@randude do you have the DLQ activated? This looks like exactly the kind of event that should just go to the DLQ imo, if it doesn't looks like the bug is there.
@original-brownbear No, i have this seeitng in logstash.yml: dead_letter_queue.enable: false
were you able to reproduce it?
@original-brownbear It seems to happen ONLY after it gets 5-7 logs lines same as the the one that appears on the file I uploaded. The ES exception was just a coincidence. I have no idea what is so special about this json input. I thought that the sync_old field is too fault when it's bigger than 2^63, but this isn't the case.
Some facts:
When writing a filter :
if [sync_old] { drop {} }
everything works. When removing this 'if', immediately it doesnt output data anymore.
I don't have enough tools to debug this, this is obliviously something related to PQ because when it is disabled, everything works just fine even with the "suspicious" log.
I tried to output to a file and then to ES. The file output consists of only the suspicious logs. After 5-7 lines, both outputs stopped writing data. Example: output { if [ sync_old] { file {} } elasticsearch {} }
When I defined it as if else, everything was working fine, both outputs were working.
output { if [ sync_old] { file {} } else elasticsearch {} }
@original-brownbear elasticsearch_template.txt This is the template that refers to the sync_ fields.
@randude sorry was a little busy the last few days and could only get back to this now.
So if sync_old
is to blame here, that should be easy to track down :) This is what I find interesting:
we send a large number, larger than 2^63
This shouldn't be possible with PQ in your ES version, there is a bug in PQ in that release that breaks handling of large number serialization on the LS end. Could it be that sync_old
exceeds 2 ^ 63
breaking the PQ before we even get to the ES output? (In that case you should have some error about deserialization of byte[]
or B[
in your logs).
Or is sync_old
dynamically created by a filter in your pipeline?
sync_old is not created by logstas's filters,, it's received from a different process via tcp. I didn't notice any error in the logstash.log. Is it written only in debug level?
Will the bug you mentioned be solved in the next version?
@randude yes, we identified the issue in the JrJackson/Jackson dependency and it'll be upgraded to a working version for all incoming releases :)
Awesome, I look forward to upgrade. I really hope that it will close these issue once and for all.
@original-brownbear Version: Logstash 6 rc1 Operating System: ubuntu 16.04
I was able to reproduce this and fix quite easily but it took me soooooooooo long to find this. 1 pipeline gets a string json formatted and sends it to ES.
We have a log that is sent only on one machine, which explains why this didnt happen on other machines.
In that log, we send a large number, larger than 2^63 which ES can't parse: 2017-10-16T16:59:24,830][DEBUG][o.e.a.b.TransportShardBulkAction] [vaultive-2017.10.16][2] failed to execute bulk it org.elasticsearch.index.mapper.MapperParsingException: failed to parse [sync_old] at org.elasticsearch.index.mapper.FieldMapper.parse(FieldMapper.java:291) ~[elasticsearch-6.0.0-beta2.jar:6.0
This causes the PQ to hang immediately. I think it also messed up other pipelines. When I added a filter that drops logs that have such a large value before getting to the output, everything worked. If I send this field as a string it also works.
It should be rather easy to reproduce. Please let me know if you need more information.