gjtorikian / selma

Selma selects and matches HTML nodes using CSS rules. Backed by Rust's lol_html parser.
MIT License
56 stars 3 forks source link

Segfault due to garbage collection #81

Closed jordandcarter closed 1 month ago

jordandcarter commented 1 month ago

Noticed some segmentation faults in production, narrowed the issue to selma gem. Seems to happen very rarely until turning on GC.stress = true which causes it to always segfault.

I managed to create a failing test here https://github.com/gjtorikian/selma/pull/80

I'll try a workaround of using GC.disable around the call to the gem, that seems to avoid the segfault locally.

gjtorikian commented 1 month ago

Can you share more about how you’re specifically using Selma (how are you initializing it, what do your selectors look like), or any of the failing text ?

jordandcarter commented 1 month ago

Did the test not fail for you? I thought that would be enough sorry, we're using it basically like the test is written. I can't share the specific failing text we ran into in prod. We first thought it was the size of the text being processed, or something specific to the text, but once I turned on GC.stress = true before the Selma::Rewriter.new(...).rewrite(html) it would segfault on all text.

gjtorikian commented 1 month ago

Ah, I misread the original issue and I hadn't seen that GC.stress always reproduces it.

This looks to be around somewhere past me flagged: https://github.com/gjtorikian/selma/blob/d1be74302513a7916d6a7e70f6cec390b258ad24/lib/selma/sanitizer.rb#L12-L15

That guy is always causing me problems.