Closed romaricpascal closed 1 year ago
And an extra option to maybe add in the mix, which I didn't think about this morning:
Before injecting strings in the markup, give the chance to the services to sanitise the HTML that was interpolated by I18n.
Advantages
Impacts
Concerns
My quick takes:
In general, XSS potential is entirely dependent on whether a service is injecting user-provided strings directly into HTML without first going through a server-side process.
This isn't an expected use case for any of our own components, nor do I imagine it being a common use case for most others that might use the I18n function, so my overwhelming view is to leave it to the service teams who actually need it to implement, and for us to not burden Frontend with something very few teams will actually need.
I believe there's a separate discussion happening about whether we should be putting visually-hidden text in controls at all (e.g. #2117) due to it preventing voice control users from using them effectively.
1. Leave it to the services Many server-side web languages come with HTML sanitisation functions out-of-the-box, and these would probably do a better job security-wise than anything we can implement in Frontend. Only potential issue I can imagine is HTML being encoded in a way that our JS can't convert back into useable HTML, but that still feels like an issue for the service team to fix rather than us.
2. Sanitise HTML before injection If we do anything, this feels like the 'best fit' approach to me, doing just enough without being overly restrictive. Leaving it to an npm package to handle is probably a good idea given we don't have any particular security expertise on the team, to my knowledge.
3. Have a separate API for text injection and HTML injection Seems like it's probably overcomplicating from both a programming and a documentation perspective (especially character count, which would have 20+ i18n parameters in this situation!)
4. Only allow translations to be passed through JavaScript Probably a non-starter based on our current trajectory and feedback from our proposal/pre-release.
5. Add a separate interpolation for visually hidden text @36degrees proposed doing this previously. I'm not extremely opposed to doing this, but it's a single solution that solves a single problem. It requires some design/documentation around custom syntax rules used in our i18n JS that isn't necessarily consistent elsewhere (i.e. Nunjucks), and I don't personally think it scales well if it turns out service developers need other, similar functionality (bold text, links, etc.)
6. Add a separate translation string for accessible name I think this could be viable but may require some a11y testing to validate first, especially on non-interactive and non-focusable elements.
7. Allow services to provide their own sanitisation function Same as option 1.
Another issue with 2 is that HTML that's output from in JavaScript will be sanitised, but HTML output from Nunjucks will not. So we'd be introducing an inconsistency that users might not appreciate, and could potentially cause issues – it feels slightly dangerous to sometimes but not always sanitise HTML.
My initial post was just a collection of the options we thought about in the dev catch-up on Monday. On my side:
1. Leave it to the services
Not opposed to it given the scope of our library and the strings being translated which limit the risks of user facing content coming in, but see 7.
2. Sanitise HTML before injection
Feels like the "ideal thing to do" and I'd be rooting for it. However, as @36degrees points out, we wouldn't treat all strings the same between Nunjucks and client-side, but provide the same ...Html
options for setting text in components which will be confusing. I believe given we want our templates to just be imported without having people to set filters in their Nunjucks environment, we can't really get Nunjucks to sanitise. So sounds like this option will have to be left out (for now?)
3. Have a separate API for text injection and HTML injection and 4. Only allow translations to be passed through JavaScript
Agreed with @querkmachine for both of them, doesn't sound like a workable way to go.
5. Add a separate interpolation for visually hidden text
Not a fan of this one for two reasons: it's potential for conflict existing formatting syntax and it's narrow scope that won't scale should we have needs for other tags.
6. Add a separate translation string for accessible name
My understanding was that this would only apply to the buttons and not non-interactive elements like the characted count, as they're the ones we need to pass HTML to at the moment. But that would make this a very ad-hoc fix and not very future proof should another component require some hidden text outside of an interactive element, so that may need a good deal of testing to make sure we're not breaking anyone's experience. On top of it, the naming conventions in our API would make it a bit weird as we'd have an <translationKeyAccessibleName>Html
option (though we could also use AccessibleName
as a marker for passing translation keys.
7. Allow services to provide their own sanitisation function
Technically they can already do this by overriding I18n.prototype.t
to post-process the resulting string. I felt that providing a clear API would cover all grounds (like I18n.setSanitiser((html) => {...})
or GovUK.setSanitiser((html) -> {...}
depending on where we think the responsibility falls) :
That's something we could document once in the config & i18n API and be done with, regardless of if new components start using the i18n API in places that may
This would also not bind us to a certain way of sanitising HTML (which we could always introduce in a future breaking change).
Another issue with 2 is that HTML that's output from in JavaScript will be sanitised, but HTML output from Nunjucks will not. So we'd be introducing an inconsistency that users might not appreciate, and could potentially cause issues – it feels slightly dangerous to sometimes but not always sanitise HTML.
For safety reasons, I'd also feel more comfortable with an "untrusted by default" approach
Following Nunjucks, where you can opt-in to render (safe) content with escaping turned off:
{{ params.html | safe if params.html else params.text }}
Admittedly it's difficult to exploit, but we currently allow components to render, escaping turned on, then still execute untrusted JavaScript. Here's one for Accordion "Show all" button .innerHTML
demonstrated:
<img>
error handlerExample staff-facing "case management" service
${personName}
placeholderThis is rendered using params.hideAllSectionsHtml
in {{ govukAccordion(params) }}
Whilst .innerHTML
won't execute code, we can inject other HTML that will
<img src="dodgy" onerror="alert('Exploit')">
personName: `Colin Rotherham <img src="dodgy" onerror="alert('Exploit')">`
Notice how escaping is turned on (best practice) but the exploit is still included
<div
class="govuk-accordion"
data-module="govuk-accordion"
data-i18n.hide-all-sections="Colin Rotherham <img src="dodgy" onerror="alert('Exploit')">"
>
The <img>
src is invalid which runs the onerror
handler
Did a bit of digging, as it occured to me while commenting on Slack that our Nunjucks template may also lead to HTML landing on the page. We got a good few ...Html
or <something.html>
options allowing users to pass custom HTML that our components output as is on the page.
We leave it to the services to sanitise whatever they pass:
You must sanitise any HTML you pass in to Nunjucks macros you’re using in your live application to protect your website against cross-site scripting (XSS) attacks. You can read more about XSS on the MDN website.
So rather than a heavy technical solution, which our options and discussions have been leaning towards so far, it may be a matter of documentation. Escaping was enough when the HTML passed to our component was rendered server-side (though kind-of breaks the point of passing HTML as all tags will be escaped). For the I18n messages, we'll need to emphasise on actual sanitisation (ie. tidy up of tags and attributes to only allow what's necessary). This could be by pointing to the OWASP documentation about it and mention that languages have libraries to take care of that complexity server side.
Besides that documentation, we may want to do a minor API update and pick a different suffix for the options related to I18n (Translation, or Message maybe). Reason for it would be to set these options apart from an option like titleHtml
in the Panel and help people understand where they need strong sanitisation vs. simply escaping (though if they only escaped the content as a form of sanitisiation options, like titleText
would do just the same).
@36degrees @colinrotherham We mentionned the aria-label
route at then end of the milestone review, and this would be the steps to make it happen I believe:
...Html
to ...Text
as they'll now be injected by text, both on the Accordion (4 keys) and CharacterCount (6 keys)textContent
(2 places, visible and visually hidden announcement)textContent
(2 places, setExpanded
and updateShowAllButton
show/hideSectionAriaLabelText
(or just ...AriaLabel
) options to the macro and in the default JavaScript configuration and update setExpanded
and updateShowAllButton
to use them for labelling the buttons4 is a bit more complex than at first glance. The button's full accessible name is not just "Show/Hide + visually hidden text", but constructed of:
a. the heading itself b. a coma for pausing the screenreader c. "Show/Hide" + visually hidden text
That's where having visually hidden content made thing easy. Swap just the last part and we were good.
We wouldn't be able to just aria-label
the <span>
(no presence in the accessibility tree, I believe, especially as it sits inside a <button>
), so we'll need to set it on the <button>
and:
That should help assessing the impact of going down that route.
It's worth bearing in mind that the issue of making visually hidden text localisable is not new – it's just the first time we've needed to combine it with text that's added or updated in the JS layer.
We already have a mix of different techniques across the codebase – we use aria-label
in:
aria-label="{{ item.visuallyHiddenText | default("Page " + item.number) }}"
)aria-label="{{ params.ariaLabel | default("Cookie banner") }}
)<nav aria-label="{{ params.navigationLabel | default(menuButtonText) }}"
)Meanwhile, we use a span with the govuk-visually-hidden
class in:
aria-label
!)I think we have this mix because we're (mostly) doing what makes sense in the context of each component.
So I don't think there's any reason why if we chose to use aria-label
here we'd be committing to using aria-label
everywhere going forward.
This also isn't the first time we're allowing users to pass us HTML that we then output on the page, with the expectation that it's up to the service team to have made that HTML 'safe'. We already have guidance about this, although it could arguably be stronger.
I think the main differences here are that:
Text
or Html
but in this case there is no option to pass a Text
version, so users can't 'opt out' and avoid this being a problemescape
filter, but it's being unescaped at the point it's read into the dataset in JS. I think this means that any user-generated content would need to be 'double escaped', and there might be some complexity relating to how Nunjucks escapes strings – I think it generally avoids double escaping unless you explicitly use forceescape
?Given what you describe around aria-label
, it sounds like it's something we're already relying on in some components, so would be a viable route for us (provided we make the update cautiously to ensure the accessible name remain the same)
I agree it doesn't commit us to using aria-label
everywhere, just solves our issue for the Accordion, the same way that it solved things for the other components that use it.
This also isn't the first time we're allowing users to pass us HTML that we then output on the page, with the expectation that it's up to the service team to have made that HTML 'safe'. We already have guidance about this, although it could arguably be stronger.
With the forceescape
filter that you describe further down, I think that would put the texts/html injected by JavaScript in the same situation as those injected by Nunjucks on the server. If we're happy with that route for the Nunjucks injected options, and provided we flesh out our documentation a bit more (eg. present options for escaping, point to the OWASP page...), I'd be enclined to think it'd be a decent solution.
- using HTML in unavoidable in the Accordion if you want to be consistent with the English translation and include a visually hidden part
I'm not sure I follow that statement as using aria-label
would be a way out of that one
- relatedly, we usually provide a way to pass either
Text
orHtml
but in this case there is no option to pass aText
version, so users can't 'opt out' and avoid this being a problem
This one would mean we have separate data attributes for the text and html version. This could be done by suffixing the keys, before adding the plural (eg. data-i18n.show-section.text
, data-i18n.character-over-limit.html.one
) and updating how the components render them (at that point, we'd likely want a I18n.prototype.setTranslatedText(element, key, options)
to encapsulate that logic. That doubles the available options as well
- we are escaping the HTML using the Nunjucks
escape
filter, but it's being unescaped at the point it's read into the dataset in JS. I think this means that any user-generated content would need to be 'double escaped', and there might be some complexity relating to how Nunjucks escapes strings – I think it generally avoids double escaping unless you explicitly useforceescape
?
Do you mean that our templates would use forceescape
and that we'd expect users to run escape
(or properly sanitise their elements, with something like DOMPurify for ex) if they pass user-provided content to the helper?
How about:
forceescape
to check that it would escape already escaped textaria-label
for the accordion.The govukPluralisedI18nAttributes
macro may be a neat place to quickly try the effect of forceescape
in isolation 😊
That forceescape
filter looks very promising for putting messages injected in JavaScript at the same level as those injected by Nunjucks templates.
There are 3 situations ahead of us:
| escape
escape
would only run the character escaping on 1 and 3. This is because under the hood, it first checks if the content to escape is already marked as a safe string.
Switching to forceescape
would mean that whichever way they're escaped, the strings provided as attributes would end up escaped. This means that services that would want to provide user-submitted strings as one of their messages have a cheap way to escape by using the | escape
filter. Using set
, they could even construct strings where only part of the content will be escaped twice, allowing them to still inject hidden elements if necessary:
{%- set contentWithHiddenText %}
{{ valueFromUser | e }}<span class="govuk-visually-hidden">{{ someSafeDevValue}}</span>
{% endset %}
forceescape
also provides a bit of safety in case we receive a string that went throught | safe
. With only escape
if the string had any quote, it'd close the attribute and wreck the markup, while with forceescape
that quote will get escaped and we'll get proper render. So, all in all, a move I think we should make.
However...
forceescape
only helps people using the Nunjucks templates. If you only use our JavaScript on top of HTML generated some other way, the burden of ensuring proper escaping will be down to whatever generates the HTML.
A bit of context
Components supporting internationalisation currently inject translated content using
innerHTML
. This is necessary to allow visually hidden text to be injected, like in the accordion "Show/Hide this section" messages. This content can reach them two ways, through:While
innerHTML
won't execute<script>
tags, there's a wide range of malicious string that could be used for Cross Site scripting should they end up as is in the HTML. With our current components, translations strings should not be user-generated, which limits the risks. However, this is not to say they'll always be, and that there won't be someone using our components with user-generated content for a translation string at some point in the future.This calls for some thoughts and there are a few routes we could take (pooled from the Dev catch-up)
1. Leave it to the services
Do nothing and trust the services to pass sanitised strings for us to inject. Maybe at some point in the future, introduce some sanitisation to limit HTML that can be injected
Advantages
Impact
Concerns
2. Sanitise HTML before injection
Before injecting the content on the page, sanitise the strings to only allow a specific set of elements and attributes.
Advantages
Impact
Cons
3. Have a separate API for text injection and HTML injection
Similarly to the Nunjucks templates, offer two paths for getting content in the DOM:
textContent
, for ex)innerHTML
, unsanitisedWe'd encourage the use of the first one, potentially flagging the second one as "dangerous". We'd also explain to users that it's down to them to ensure that only sanitised content ends up in the HTML part.
Advantages
Impacts
Concerns
4. Only allow translations to be passed through JavaScript
Restrict the passing of translation strings via JavaScript, to prevent them from appearing on element, possibly making them harder to modify?
Advantages
Impacts
Concerns
5. Add a separate interpolation for visually hidden text
Ultimately, the reason for using
innerHTML
is to inject<span class="govuk-visually-hidden">...</span>
s to help accessibility, so we may want to focus on doing just that rather than trying to allow any HTML.This could take the form of a special marker in translation strings to wrap content to be visually hidden (as it won't necessarily only be appended).
Advantages
Impacts
<span>
s. Note that this would need to be after the text is safely injected on the page withtextContent
. If we were to interpolate first, then we'd have to useinnerHTML
to inject the interpolated string, leaving us vulnerable if someone put some malicious code alongside the visually hidden content.Concerns
6. Add a separate translation string for accessible name
We're using visually hidden content to provide a different accessible name. Another route for it could be to allow a translation to be passed for the
aria-label
attribute and restrict the visible label to be injected only as text. `` AdvantagesImpacts
aria-label
attributes of the two states (show/hide) of its two buttons (all, per section) and use them accordinglyConcerns
Who needs to work on this
Developers
Who needs to review this
Developers
Done when