JuliaWeb / Gumbo.jl

Julia wrapper around Google's gumbo C library for parsing HTML
Other
154 stars 25 forks source link

Superfluous insertion of whitespace in `text` #100

Open thchr opened 8 months ago

thchr commented 8 months ago

I expect the below to produce "foobarbaz" - but it produces "foo bar baz":

text(parsehtml("<em>foo</em>bar<em></em>baz").root)

The motivating case was:

text(parsehtml("<math><mrow><msub><mrow><mi>MoSe</mi></mrow><mrow><mn>2</mn></mrow></msub></mrow></math>").root)

which produces "MoSe 2" rather than "MoSe2".

The issue is with the apparently redundant ' ' in https://github.com/JuliaWeb/Gumbo.jl/blob/afc2b2b83501d483e416d86063d98d567968fea7/src/manipulation.jl#L45 Would it be acceptable to change this? If so, I can make a PR.