YusukeIwaki / capybara-playwright-driver

Playwright driver for Capybara
MIT License
154 stars 13 forks source link

Node#visible_text use scrub to replace invalid UTF-8 sequences #76

Closed reedrolemodel closed 2 months ago

reedrolemodel commented 3 months ago

Some pages cause a invalid byte sequence in UTF-8 exception to be raised when calling text.to_s.gsub(/\A[[:space:]&&[^\u00a0]]+/, ''). Adding scrub prevents this.

Specific context: It seems a   HTML entity gets interpreted as "\xA0", or byte 160, which has an invalid encoding. Using charlock_homes the encoding of the entire page is reported as ISO-8859-1 with 54% confidence.

mhenrixon commented 2 months ago

I came here to report the issue! I'm glad to discover it already has a PR.

  1) Static pages GET /de/cookies renders a cookie policy
     Failure/Error:
       text.to_s.gsub(/\A[[:space:]&&[^\u00a0]]+/, '')
           .gsub(/[[:space:]&&[^\u00a0]]+\z/, '')
           .gsub(/\n+/, "\n")
           .tr("\u00a0", ' ')

     ArgumentError:
       invalid byte sequence in UTF-8

     [Screenshot Image]: tmp/capybara/screenshots/failures_r_spec_example_groups_static_pages_get_de_cookies_renders_a_cookie_policy_91.png

     # /Users/mhenrixon/.gem/ruby/3.3.5/gems/capybara-playwright-driver-0.5.2/lib/capybara/playwright/node.rb:134:in `gsub'
     # /Users/mhenrixon/.gem/ruby/3.3.5/gems/capybara-playwright-driver-0.5.2/lib/capybara/playwright/node.rb:134:in `block in visible_text'
     # /Users/mhenrixon/.gem/ruby/3.3.5/gems/capybara-playwright-driver-0.5.2/lib/capybara/playwright/node.rb:83:in `assert_element_not_stale'
     # /Users/mhenrixon/.gem/ruby/3.3.5/gems/capybara-playwright-driver-0.5.2/lib/capybara/playwright/node.rb:120:in `visible_text'
     # /Users/mhenrixon/.gem/ruby/3.3.5/gems/capybara-3.40.0/lib/capybara/node/element.rb:60:in `block in text'
     # /Users/mhenrixon/.gem/ruby/3.3.5/gems/capybara-3.40.0/lib/capybara/node/base.rb:77:in `synchronize'
     # /Users/mhenrixon/.gem/ruby/3.3.5/gems/capybara-3.40.0/lib/capybara/node/element.rb:60:in `text'
     # /Users/mhenrixon/.gem/ruby/3.3.5/gems/capybara-3.40.0/lib/capybara/queries/selector_query.rb:603:in `matches_text_regexp'
     # /Users/mhenrixon/.gem/ruby/3.3.5/gems/capybara-3.40.0/lib/capybara/queries/selector_query.rb:607:in `matches_text_regexp?'
     # /Users/mhenrixon/.gem/ruby/3.3.5/gems/capybara-3.40.0/lib/capybara/queries/selector_query.rb:554:in `matches_text_filter?'
     # /Users/mhenrixon/.gem/ruby/3.3.5/gems/capybara-3.40.0/lib/capybara/queries/selector_query.rb:452:in `matches_system_filters?'
     # /Users/mhenrixon/.gem/ruby/3.3.5/gems/capybara-3.40.0/lib/capybara/queries/selector_query.rb:122:in `matches_filters?'
     # /Users/mhenrixon/.gem/ruby/3.3.5/gems/capybara-3.40.0/lib/capybara/result.rb:32:in `block in initialize'