brianmario / yajl-ruby

A streaming JSON parsing and encoding library for Ruby (C bindings to yajl)
http://rdoc.info/projects/brianmario/yajl-ruby
MIT License
1.48k stars 169 forks source link

Segmentation Fault on ruby 2.7 #199

Open andy-ganchrow opened 3 years ago

andy-ganchrow commented 3 years ago

With yajl-ruby-1.4.1

Over the last week we have been experiencing a, for lack of a better word, massive issue with ruby applications just crashing intermittently

The backtrace ultimately points to the Hash#to_json, created from requiring yajl/json_gem

Removing all instances of the yajl/json_gem in favor of the json library fixed the issue, but not until today did I come up with a somewhat simple script to reproduce the same crash report

For Ubuntu 20.04, The following works on ruby2.6, but fails on 2.7.0, 2.7.2, and sometimes on 2.7.1

require 'yajl/json_gem'
a = 10000.times.map { |x| { a: 5, 1 => 'foo', b: [1, 2, 3] } }
a.each { |b| b.to_json }
require 'yajl/json_gem'
a = 10000.times.map { |x| {} }
a.each { |b| b.to_json }

No issues at all with any of the above on Ubuntu 20.10, which is an indicator its a libyaml or libjson library

Its worth noting the following does not fail ( or at least immediate enough for me to notice )

require 'yajl/json_gem'
a = 1000.times.map { |x| {} }
a.each { |b| b.to_json }

And a version that does not assign a variable but loops 10000 times does not fail either

require 'yajl/json_gem'
10000.times.each { |x| {}.to_json }

The crash report is below, and immediately what stands out to me is the backtrace appears to suggest an on_progress_callback is provided to the to_json method, which obviously its not

Let me know and I'll try to provide the differences in the installed libs on 20.10 and 20.04

https://pastebin.com/fZq0iKMQ

amymsahli commented 3 years ago

We are seeing this exact issue as well with Ruby 2.7.2 and Ubuntu 20.04. We will be trying to remove the yajl/json_gem also. :(

markdascher commented 1 year ago

This is caused by a missing RB_GC_GUARD combined with some more aggressive gcc optimizations. -fno-caller-saves seems to help sometimes, but I think that's just lucky register shuffling. The same flag could just as easily make things worse.

At the very least, this return statement needs to be split up, to insert RB_GC_GUARD(rb_encoder); after the rb_yajl_encoder_encode call:

https://github.com/brianmario/yajl-ruby/blob/e8de283a6d64f0902740fd09e858fc3d7d803161/ext/yajl/yajl_ext.c#L32-L39

But there could be other spots too. The whole situation feels like a mess. Maybe the real lesson is to just avoid native extensions.