apostrophecms / sanitize-html

Clean up user-submitted HTML, preserving whitelisted elements and whitelisted attributes on a per-element basis. Built on htmlparser2 for speed and tolerance
MIT License
3.69k stars 351 forks source link

Allow escaping of more than just tags #540

Closed Zsar closed 2 years ago

Zsar commented 2 years ago

The problem to solve

I would like the sanitised string to be displayed exactly as input. Discarding text is not helpful, as it will be missing. options.disallowedTagsMode = 'recursiveEscape' already exists to escape tags instead of removing them. Attributes, etc. are still removed instead of escaped though.

User Story:

  1. We start with (mostly) everything disallowed.
  2. User attempts to input HTML, sees the resulting text and copy&pastes directly into a feature request to allow the tag/attribute/style/whatever they want and do not get.
  3. (If the requested element can be safely allowed) we copy&paste the text directly into our mock project for TDD.
  4. loop 2., 3.
  5. Profit!

Proposed solution

I, personally, would like a simple switch options.disallowedEverythingMode: 'discard' | 'escape' | 'recursiveEscape' or similar.

Alternatives

Mirroring the lists and records of options akin to how the existing options.disallowedTagsMode mirrors options.allowedTags is probably the more sensible approach:

options.disallowedAttributesMode: 'discard' | 'escape' | 'recursiveEscape'
options.disallowedStylesMode: 'discard' | 'escape' | 'recursiveEscape'
<etc.>

Mayhap a synthetic "all the things" could be added as a function on top of those: function setDisallowedEverythingMode(mode: 'discard' | 'escape' | 'recursiveEscape') {/* set every flag to mode */}

Additional context

options: {
  allowedAttributes: {},
  allowedTags: [],
  disallowedTagsMode: 'recursiveEscape',
  nonTextTags: [],
  selfClosing: [],
}

Input: "Cursed Project <img src=x onerror=alert('AllYourBaseAreBelongToUs!');>" Expected: Cursed Project <img src=x onerror=alert('AllYourBaseAreBelongToUs!');> Displayed: Cursed Project <img></img>

boutell commented 2 years ago

I agree it doesn't make sense for recursiveEscape to keep tags (in escaped form) and yet discard attributes, given that it's all being converted to text anyway.

On Mon, Feb 21, 2022 at 8:50 AM Zsar @.***> wrote:

The problem to solve

I would like the sanitised string to be displayed exactly as input. Discarding text is not helpful, as it will be missing. options.disallowedTagsMode = 'recursiveEscape' already exists to escape tags instead of removing them. Attributes, etc. are still removed instead of escaped though.

User Story:

  1. We start with (mostly) everything disallowed.
  2. User attempts to input HTML, sees the resulting text and copy&pastes directly into a feature request to allow the tag/attribute/style/whatever they want and do not get.
  3. (If the requested element can be safely allowed) we copy&paste the text directly into our mock project for TDD.
  4. loop 2., 3.
  5. Profit!

Proposed solution

I, personally, would like a simple switch options.disallowedEverythingMode: 'discard' | 'escape' | 'recursiveEscape' or similar. Alternatives

Mirroring the lists and records of options akin to how the existing options.disallowedTagsMode mirrors options.allowedTags is probably the more sensible approach:

options.disallowedAttributesMode: 'discard' | 'escape' | 'recursiveEscape' options.disallowedStylesMode: 'discard' | 'escape' | 'recursiveEscape'

Mayhap a synthetic "all the things" could be added as a function on top of those: function setDisallowedEverythingMode(mode: 'discard' | 'escape' | 'recursiveEscape') {/* set every flag to mode */} Additional context options: { allowedAttributes: {}, allowedTags: [], disallowedTagsMode: 'recursiveEscape', nonTextTags: [], selfClosing: [], } Input: "Cursed Project " Expected: Cursed Project Displayed: Cursed Project — Reply to this email directly, view it on GitHub , or unsubscribe . Triage notifications on the go with GitHub Mobile for iOS or Android . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

--

THOMAS BOUTELL | CHIEF TECHNOLOGY OFFICER APOSTROPHECMS | apostrophecms.com | he/him/his

boutell commented 2 years ago

This would make a good community contribution.

On Tue, Feb 22, 2022 at 8:40 AM Tom Boutell @.***> wrote:

I agree it doesn't make sense for recursiveEscape to keep tags (in escaped form) and yet discard attributes, given that it's all being converted to text anyway.

On Mon, Feb 21, 2022 at 8:50 AM Zsar @.***> wrote:

The problem to solve

I would like the sanitised string to be displayed exactly as input. Discarding text is not helpful, as it will be missing. options.disallowedTagsMode = 'recursiveEscape' already exists to escape tags instead of removing them. Attributes, etc. are still removed instead of escaped though.

User Story:

  1. We start with (mostly) everything disallowed.
  2. User attempts to input HTML, sees the resulting text and copy&pastes directly into a feature request to allow the tag/attribute/style/whatever they want and do not get.
  3. (If the requested element can be safely allowed) we copy&paste the text directly into our mock project for TDD.
  4. loop 2., 3.
  5. Profit!

Proposed solution

I, personally, would like a simple switch options.disallowedEverythingMode: 'discard' | 'escape' | 'recursiveEscape' or similar. Alternatives

Mirroring the lists and records of options akin to how the existing options.disallowedTagsMode mirrors options.allowedTags is probably the more sensible approach:

options.disallowedAttributesMode: 'discard' | 'escape' | 'recursiveEscape' options.disallowedStylesMode: 'discard' | 'escape' | 'recursiveEscape'

Mayhap a synthetic "all the things" could be added as a function on top of those: function setDisallowedEverythingMode(mode: 'discard' | 'escape' | 'recursiveEscape') {/* set every flag to mode */} Additional context options: { allowedAttributes: {}, allowedTags: [], disallowedTagsMode: 'recursiveEscape', nonTextTags: [], selfClosing: [], } Input: "Cursed Project " Expected: Cursed Project Displayed: Cursed Project — Reply to this email directly, view it on GitHub , or unsubscribe . Triage notifications on the go with GitHub Mobile for iOS or Android . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

--

THOMAS BOUTELL | CHIEF TECHNOLOGY OFFICER APOSTROPHECMS | apostrophecms.com | he/him/his

--

THOMAS BOUTELL | CHIEF TECHNOLOGY OFFICER APOSTROPHECMS | apostrophecms.com | he/him/his

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.