Closed panthershark closed 6 years ago
This change was to start ruling out XSS attack vectors in Elm.
If you need help updating some code, folks on slack or discourse can help you find a way to get things going nonetheless!
I know y'all made your decision and I know that probably will not change, but I'm still gonna take a moment and explain why I think there should be a way to use this... even if it is behind some weird flag or something emits a warning or whatever..
Elm is great. The flame wars on hacker news are dumb and this is not one of them. Please read this with an open mind and consider the trade-offs made here and how they hurt application developers.
The main reason is this: this does not prevent XSS, just obfuscates this and there is no reasonable workaround for application developers that need to inject formatted content. Here are the options that have been proposed:
The assertion that it prevent XSS is at the very least partially untrue. If the goal is prevent package authors from using it, maybe a static analysis tool that must pass before publishing would be preferable to removing a feature of the browser.
Thank you for your time and for making Elm.
What is your specific scenario? What is the nature of the formatted content? Where is it from? How is it created?
There are 2 scenarios currently in mind:
When a server api fails on the server, it formats the error message as html to go with the error code. This can be worked around, but it requires a bunch of elm to take error codes, convert them to error models, then render a template for each error type.
Content is authored in a system and presented in elm. The content comes from that system as html. The workaround here is a bloated custom element or iframe that has to round trip to load the content from a special endpoint.
It seems that custom elements are the preferred way to divert the responsibility away from elm, but with the sad state of IE11 and Safari WRT web components, this doesn't isolate the runtime from XSS. It pushes the work to the application developer.
I think I understand where y'all are coming from with the change. With the state of browsers and compatibility, it might just be a little early to make it impossible to inject html.
Is it true that case (1) can be done by sending JSON over? Maybe with { code : Int, message : String }
or something a bit more elaborate? How elaborate would it be? Would the resulting code be harder or easier to maintain than having the server produce raw HTML? I'd imagine the current risk is that changes on the frontend require changes on the backend as well, but it may be pretty non-obvious.
Can you say more about case (2) though? What is the nature of this content? Are you saying that it is produced in some other context that can only produce HTML? What parts of HTML does it produce? Arbitrary tags, or some subset for documents?
Scenario 1 triggers a backend refactor when upgrading to elm 0.19. While this is extra work for our small team, our architecture allows for reasonable mitigation.
Scenario 2 is a little more problematic. Here are 2 situations:
In our community, posts are created with a RTE. The content is html and the posts are rendered.
Document management system - Users upload documents in word, pdf, etc. An offline process converts them to html previews (open office, pdf-to-html utils, etc) so previews can be made available on devices (pdfs are not iOS friendly for inlining). The custom element deficiency is really annoying here and this has been mitigated with iframes.
We can go through all of my use cases, but it seems that we could also look at the proposed alternatives and decide whether those are acceptable solutions, specifically citing the deficiency in safari and ie11.
While I'm not sure I totally agree, I would accept the custom element as a viable solution if the browsers supported it natively.
This came from a chat in slack earlier about this:
My position is that this solution is akin to hunting a deer with a bazooka. Sure, you get the deer, but you also get a bunch of other stuff . application developers are the collateral damage. a rifle with a scope would just ensure XSS is not in packages, but keep compatibility with the underlying platform.
Thank you for sharing more information about your situation. In our 2 month period with the community pre-release, I do not recall anyone raising a situation like this. It is very hard to anticipate scenarios that neither your nor anyone you know has ever done, so again, I appreciate you explaining things in more detail.
For future reference, the slack message you shared is not the kind of evidence that typically goes into a decision. It seems like it is an emotionally satisfying way to vent the root frustration, but ultimately, we try to make decisions based on concrete technical considerations. If that person also has a situation like yours, it would be great to hear that example as well. Again, we did not know this sort of scenario existed for companies using Elm.
I will try to find you on Slack to discuss a bit further. In the meantime, if you find other people in the same scenario, please encourage them to share their technical details here!
There is nothing emotionally satisfying about the slack situation. We are all respectfully working through our points of view and this is healthy.
We are a 25 person company with 6 engineers (frontend, backend, devops). We build things very quickly and we have an aggressive product roadmap. Elm has been so good for us as a team as a result reduced our team size by 4 front end engineers (from 9 down to 5) and we actually got way better at building things. The unfortunate side effect of this is that we haven't been able to spin up Elm 0.19 ahead of release.
While my position gives me the authority redirect our resources to address the new limitation, this might cause a lot of friction for others as they would have to convince managers or others..
We love Elm and will continue to be a stanch advocate. Again, thank you for making a great language / platform.
Here is another thread about innerHTML if anyone else wants to follow along there as well. https://github.com/elm/virtual-dom/issues/131
I was thinking a bit about the ports idea. You ruled it out in your comment, but I think it is actually a viable path. Here is a rough brainstorm on that route:
type alias Model =
{ previews : Dict String String -- key is an ID and value is the HTML string
}
port setPreviews : List (String, String) -> Cmd msg
viewPreview : String -> String -> Html msg
viewPreview key html =
div [ id key, attribute "data-html" html ] []
And then in JavaScript having something like:
var app = Elm.Main.init();
app.ports.setPreviews.subscribe(function(previews) {
for (var i = 0; i < previews.length; i++) {
var node = document.getElementById(previews[i][0]);
node.innerHTML = previews[i][1];
}
});
Now I am not 100% sure what virtual DOM will do when people use innerHTML
at different times. (It was not written with that as a feature, and I am surprised that what you seem to be describing works in the first place.) But if diffing is causing problems, it seems possible to do a trick like using Html.Keyed.node "div" [] [ ( String.concat (Dict.keys previews), viewPreviews previews ) ]
to make sure things get deleted whenever there is a change to the content.
I understand that this is not the ideal path for you, but on the other side of things, we do not want packages out there that insert <script>
tags under the guise of a date picker or a credit card entry view. I also understand that in npm
there are no protections against that sort of thing and AFAIK there has been no high-profile use of this particular exploit. They pay the cost of having a "reporting" infrastructure where people are doing a bunch of things by hand, and for the money we have to spend on Elm, I don't think it makes sense to use it in that way. I need to balance a bunch of competing concerns like this, so given that we have one concrete example so far, I think it makes sense to pursue this seemingly tractable workaround for now.
Yeah, based on saying it all out loud, I think you could do something like this:
type alias Model = { previews : List Preview }
type alias Preview = { id : String, html : String }
port resetPreviews : () -> Cmd msg
view : Model -> Html msg
view model =
div []
[ ...
, Html.Keyed.node "div" []
[ ( String.concat (List.map Tuple.first model.previews)
, viewPreviews model.previews
)
]
]
viewPreviews : List Preview -> Html msg
viewPreviews previews =
div [ id "previews" ] (List.map viewPreview previews)
viewPreview : Preview -> Html msg
viewPreview preview =
div [ id preview.id, attribute "data-html" preview.html ] []
Now on the JavaScript side you need to handle resetPreviews
messages at some interval.
var app = Elm.Main.init();
app.ports.resetPreviews.subscribe(function() {
var kids = document.getElementById('previews').childNodes;
for (var i = 0; i < kids.length; i++) {
kids[i].innerHTML = kids[i].getAttribute('data-html');
}
}
});
I'm not 100% all the code here will work as stated, but I think this is closer than the last idea.
Sure, implementing a thunk with a port is an option. I assume this port needs to be called on every update.. yeah?
Also, does Elm guarantee the port run immediately after DOM patch?
Also, what about using requestAnimationFrame
? I'm sure elm is using it internally and what side effects come out of it?
Yeah, definitely one downside of this idea is that you'll need to call resetPreviews
from update
whenever previews
changes. I assumed the data would be loaded asynchronously, so there would be a natural spot to add that command. That definitely is a spot for mistakes, but I set things up with Html.Keyed
so that you'll get a pretty obvious failure rather than something subtle and weird.
And about timing, Elm hooks into requestAnimationFrame
by default for almost all view
calls. With 0.19 it is possible to get synchronous renders on user inputs as well, solving some other issue about race conditions when typing really fast. Anyway, this is a nice default for performance, but it requires some extra care when using ports to mess with the DOM. So the actual reliable JS code may look more like this:
var app = Elm.Main.init();
app.ports.resetPreviews.subscribe(function() {
var node = document.getElementById('previews');
node
? resetPreviews(node)
: requestAnimationFrame(resetPreviews);
});
function resetPreviews(node) {
var node = node || document.getElementById('previews');
if (!node) return;
var kids = node.childNodes;
for (var i = 0; i < kids.length; i++) {
kids[i].innerHTML = kids[i].getAttribute('data-html');
}
}
I think you may be able to cut some of this depending the particulars of your scenario, which I do not know well enough to say. I also wrote this such that you call document.getElementById
as little as possible. It could be simpler if resetPreviews
just called itself, gated by rAF
but that felt worse to me.
We discussed a bit in chat, and I wanted to record that here as well.
One of the things that came up was paths that would allow innerHTML
for application developers, but rule it out in packages. The goal there is that package.elm-lang.org
can still guarantee that a date picker or credit card package does not insert <script>
tags, but an application developer can do things if they really really need to.
There are two potential designs that came to mind for that idea:
property "innerHTML"
and detect it if it gets into packages. The trouble is that string may be constructed at runtime, possibly based on user input. Maybe folks encode things with UTF-8 code-points and convert to string later. Etc. So detecting this statically is not technically feasible to my knowledge.Html.Attributes.innerHTML
for this case. When a package is published, just check that it does not exist in the code. The trouble is that code actually does not go through the package website right now. We would like to do that someday (to guarantee packages compile and their tests pass) but it is a quite large infrastructure project that would definitely take a notable number of weeks or months. And even then, it is not just a grep
for innerHTML
because that is a totally valid name in Elm. We would need a semantic understanding of import Html.Attributes exposing (innerHTML)
or import Html.Attributes as A
to figure it out precisely.So the second design is technically possible, but it has two major downsides. (1) The time spent on that path will be at the expense of working on other things in the ecosystem. I'd like to focus on various web platform packages for example, and I am under a lot of pressure to do so. Other folks are volunteer workers, and an infrastructure project like this will take them longer and compete with the work they are already doing. The bigger picture infrastructure changes are desirable, but we'd like to integrate it with testing and it may block on some other things. (2) There are a significant number of people who do not want innerHTML
available in application code. In a team of 10 or 20, it is nice to be confident that no summer intern can accidentally introduce an XSS vector by accident. In a team of two, it is nice that by default this potential security issue is not a problem. You have to go out of your way, into JS, to actually open the possibility. From there, you know exactly where the risks are, making it easier to have a "sanitize everything" function to be extra safe.
So there are other paths here, but in trying to balance all the competing concerns on design and prioritization, I think the path with ports outlined in this thread is the best overall, though clearly it is not ideal for your situation.
If you want to talk anything through, please reach out again. I know Richard and Luke have more experience with customElements
when that path gets more viable on different platforms. And thank you for talking through all this!
When Safari and IE11 finally get their act together with custom elements, that is a much better solution. The polyfill is larger than most elm apps. If you can afford the polyfill, it seems much nicer than the ports option.
It is really nice that the 0.19 rendering is predictable now. My initial concern with ports as a viable option was the side effects of async rendering.
@evancz - thanks for the conversation and closing the loop here. :)
Hey @evancz, I just wanted to chime in and say that I wouldn't mind spending some time to implement the second potential design that you mentioned (have an explicit Html.Attributes.innerHTML
). This would be much more preferable (for me) than requiring developers to jump through unnecessary hoops like using ports or creating custom elements. After all, the "summer intern" that you mentioned will just search up how to set innerHTML
in Elm and end up coding the same vulnerability in a much more convoluted way that is more difficult to audit.
I haven't read much of the Elm builder source, but I'm assuming it would be relatively trivial to generate and traverse a full AST of any package during elm publish
and evaluate the import
statements?
Adding explicit support for innerHTML
is potentially technically feasible, but:
innerHTML
support means an XSS attack vector is now upgraded from mistake to feature. It also encourages people to work with unstructured HTML in other parts of their codebase. I think using structured formats will be better in the long run, even though it has a real cost to certain users at this time.I understand the argument that the intern could still add an XSS vulnerability. I think one of the valuable parts of Elm is that "In Elm, you get certain guarantees." If you go out to JS, all of that goes out the window. I think it would be a pretty nice guarantee that "Elm code does not have XSS vulnerabilities via HTML" and worth working towards. If your application is 85% or 90% or 95% Elm code, you have very very dramatically reduced the code that needs security audits. I think that is a big deal.
I get that people are not happy with the decision here, but even if we can find three or four examples, it is a really big thing to give up for those cases.
Thanks for (another!) really thorough reply, @evancz. I can't fault your reasoning at all, and I agree with all of your points.
The faq page at http://faq.elm-community.org/#how-can-i-output-literal-html-and-avoid-escaping-of-entities is out of date. It would have saved about 1 hour of me tinkering around if that page contained the link to here.
For those who have the similar issue, we had the same issue with innerHTML property and we used Html.Parser and Html.Parser.Util to parse strings into elm nodes. works fine for us. give it a try. https://package.elm-lang.org/packages/hecrj/html-parser/latest/
let
nodes =
case Html.Parser.run item.calculation.htmlContent of
Ok parsedNodes ->
Html.Parser.Util.toVirtualDom parsedNodes
_ ->
[]
in
div [ class "formula", dir "ltr" ] nodes
In the end, people have backends where content editors type in HTML. That needs to be displayed as HTML. Very common scenario. Elm cannot give guarantees here either way, either it goes to ports or it goes to the example just above. So with 0.19 it's simply more work. Up to a lot more work.
The problem with @psyCodelist's example is that you cannot force the content editor to type in perfect html. Which is what is required I'm afraid.
How this is normally handled is that the output from the backend goes through some escaping mechanism. Because sometimes the script tag is ok, i.e. every kind of HTML is OK, and sometimes it isn't. At Elm's level you usually cannot know. It's the backend that knows who entered a piece of content, and the privileges that editor has, i.e. how much output escaping needs to be done.
So in my opinion Elm simply needs to accept random HTML, it's up the backend to give the appropriate HTML.
I've got a solution that is nice and balances both worlds. I'll post the solution this weekend.
The faq page at http://faq.elm-community.org/#how-can-i-output-literal-html-and-avoid-escaping-of-entities is out of date. It would have saved about 1 hour of me tinkering around if that page contained the link to here.
Any yet no one opened an issue on that FAQ item until a few days ago.
@berenddeboer - here is a gist that we use. In our case, this is really nice b/c we need to apply functions on the FE to a.href
for the following use case:
There are a lot of cases where you might want process the HTML and this gives you a place to do this. https://gist.github.com/doanythingfordethklok/d6e4fee5b5d07ee500683cd989ae69a8
here is a gist that we use.
It seems this is basically the @psyCodelist solution right? I.e. parse html (so has to be valid), then insert.
@evancz I totally understand the idea of anything being written in elm having certain guarantees, and I certainly agree with it (to a degree).
With that being said, I also thoroughly agree with the side of things where you already have suitably sanitised HTML that you wish to render to the page. In that case, I'd like to propose to possible solutions that get the best of both worlds:
Overall, I really like the direction that the language of going, and I can tell that it's design is in very responsible hands. Dangerously setting HTML is definitely a use case that I believe Elm should protect against, however there is clearly demand for it, and I think it's a good idea to make it possible in some sort of predictable and widely supportable manner, but not necessarily from directly inside of Elm.
Imho, dangerouslySetWhatever was not a good idea in react nor will it be good here. If there is safe sanitized valid html, then the libs above will convert it into vdom in a single function call so it doesn't seem to save anything.
Imho, dangerouslySetWhatever was not a good idea in react nor will it be good here. If there is safe sanitized valid html, then the libs above will convert it into vdom in a single function call so it doesn't seem to save anything.
I take issue with this "you can parse an HTML string into a VDOM object so there's no need for innerHTML" rhetoric. We shouldn't be shipping an entire HTML parser down to end users when the browser already has its own parser. And referring to said parser as "a single function call" is grossly ignorant of the CPU time required to parse many HTML strings in quick succession. Penalizing users (and developers) with this unnecessary performance overhead just to replicate something the browser already does an order of magnitude more efficiently isn't great.
Right. It can be parsed once on load, converted to vdom nodes, then re-used. For cases where this is truly unteneble, web components or ports are great options that don't open up attack vectors to everyone who consumes the language.
The browser already provided a dangerousSetHtml functionality. It is called innerHTML and it is still available via ports.
If you read through these threads, you'll see that originally I agreed with you. After spending some time truly thinking through the trade offs, it became clear that I was wrong.
If there is safe sanitized valid html,
What is sanitised? A the Elm level you can't know that. If the backend has authenticated a user to emit certain html, that's what is considered sanitised.
At the Elm level you often don't know what permissions the user has, i.e. may they insert scripts? Or just use em
and strong
? And you don't want to repeat those permissions in Elm either.
suitably sanitised HTML
Apologies for misquoting earlier.
I just wanted to say thanks to @evancz for providing a workaround and explaining why innerHTML
is no longer supported in 0.19. His solution provided the basis of my solution.
It seems that pantershark changed his github's name and the link to the gist has changed, I think the correct one is this one:
https://gist.github.com/panthershark/d6e4fee5b5d07ee500683cd989ae69a8
This used to work in elm 0.18 and it doesn't show in the DOM. It looks like a bug