apostrophecms / sanitize-html

Clean up user-submitted HTML, preserving whitelisted elements and whitelisted attributes on a per-element basis. Built on htmlparser2 for speed and tolerance
MIT License
3.69k stars 351 forks source link

Fix #504: fix inconsistent iframe src behavior across browsers #505

Closed mattclough1 closed 2 years ago

mattclough1 commented 2 years ago

Fix #504

boutell commented 2 years ago

Well that should certainly work! Since this is mainly for the benefit of browsers, where size is an issue, how much does it enlarge the output?

abea commented 2 years ago

@boutell The unpacked size is listed as 94.5 kB

boutell commented 2 years ago

I think it's worth checking whether a simple wrapper function could be used to spot leading // on path and fix hostname and path. That would save a lot on frontend build size.

mattclough1 commented 2 years ago

@boutell digging in a bit more, I think this may be a bug that prevents the affected browsers from parsing URLs (base or otherwise) that don't use a special scheme. How do we feel about just prepending a special scheme to the value before parsing if it begins with // since the parsed value is only being used for the hostname?

boutell commented 2 years ago

That sounds like the right kind of workaround, yes.

On Tue, Nov 16, 2021 at 9:56 AM Matt Clough @.***> wrote:

@boutell https://github.com/boutell digging in a bit more, I think this may be a bug that prevents the affected browsers from parsing URLs (base or otherwise) that don't use a special scheme https://url.spec.whatwg.org/#special-scheme. How do we feel about just prepending a special scheme to the value before parsing if it begins with // since the parsed value is only being used for the hostname?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/apostrophecms/sanitize-html/pull/505#issuecomment-970353845, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAH27IZFDYL24ZTYYJZ433UMJWKFANCNFSM5GAPDJ4Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

--

THOMAS BOUTELL | CHIEF TECHNOLOGY OFFICER APOSTROPHECMS | apostrophecms.com | he/him/his

boutell commented 2 years ago

Hmm, I merged this too quickly, it breaks a test:

      AssertionError [ERR_ASSERTION]: '<iframe src="https://www.youtube.com/embed/c2IlcS7AHxM"></iframe>' == '<iframe src="//www.youtube.com/embed/c2IlcS7AHxM"></iframe>'
      + expected - actual

      -<iframe src="https://www.youtube.com/embed/c2IlcS7AHxM"></iframe>
      +<iframe src="//www.youtube.com/embed/c2IlcS7AHxM"></iframe>

The trouble with this workaround is that it changes intentionally protocol-relative URLs to no longer be protocol-relative. So we would need to undo that to handle it right.

Can you account for that in a new PR?

mattclough1 commented 2 years ago

Yeah, apologies, was still working on it and should've tagged the PR as such 🙁