LeaVerou / awesomplete

Ultra lightweight, usable, beautiful autocomplete with zero dependencies.
http://leaverou.github.io/awesomplete/
MIT License
6.97k stars 611 forks source link

Escape HTML characters in list of suggestions #17189

Open dontcallmedom opened 5 years ago

dontcallmedom commented 5 years ago

fix #16932

Generate manual DOM subtree for each marked instance of the queried string instead of generating an unsafe HTML string

dontcallmedom commented 5 years ago

I have hopefully adjusted the code to the coding style.

I may be missing something for the replace function - while it would work fine to generate the mark elements, I don't see how to use it succinctly to escape the HTML characters that would be outside of the matched parts of the string.

LeaVerou commented 5 years ago

I have hopefully adjusted the code to the coding style.

Thanks!

I may be missing something for the replace function - while it would work fine to generate the mark elements, I don't see how to use it succinctly to escape the HTML characters that would be outside of the matched parts of the string.

That made me realize that as currently written, this is backwards incompatible: HTML in options works right now, whereas if this is merged, it would stop working and possibly break numerous use cases. Ideally we want something that keeps HTML that was there from the beginning, but doesn't convert special characters to HTML when highlighting them. I.e. something that's XSS-safe, but doesn't restrict what you can do in the initial options. Does that make sense?

dontcallmedom commented 5 years ago

re backwards-compatibility, I don't think it can be preserved:

I guess we could make the new behavior opt-in - but given the XSS risks and the existing ambiguity, it seems to me making it the default would be better. That being said, I could imagine making it possible to opt-out of the new behavior (e.g. through a new option) - this would imply adapting the way the labels gets displayed in the <input> to get only the textContent equivalent.

LeaVerou commented 5 years ago

right now, the API is ambiguous about the type of content in label - when rendered in the list options, it is rendered as HTML, but when rendered in the <input> field, it is rendered as plain text (since it goes through the value attribute).

I believe that's fine, e.g. many use cases are about displaying graphics next to the suggestions.

it's not really a matter of highlighting the content of the label with <mark> - the XSS risks exist even when no highlighting happens - if the list of label options contains a malicious <script> (e.g. coming from a third party ajax service, or from an http query parameters), that script will be loaded as soon as the list of labels is displayed

Good point, that should be prevented (though I'm still not sure if it should be the default at this point). However, if there's HTML in the initial options, included in the page's markup, that should be preserved, no?

I noticed that the current behavior is definitely buggy: If you include <b> in data-list, it will be rendered (correct). If you include &lt;b&gt;, it will still be rendered! That's definitely a bug which needs to be fixed in both cases (regardless of someone opts in to allowing HTML or not).

I guess we could make the new behavior opt-in - but given the XSS risks and the existing ambiguity, it seems to me making it the default would be better. That being said, I could imagine making it possible to opt-out of the new behavior (e.g. through a new option) - this would imply adapting the way the labels gets displayed in the <input> to get only the textContent equivalent.

I wonder if it would be an easier transition to make it opt-in in this version, and announce that it's going to be default behavior in the next one, and people should use the option now if they need it.

dontcallmedom commented 5 years ago

I have looked a bit more into this, and I think there are the following bugs in the way HTML values or labels are handled:

Here is how I think HTML in the list of suggestions could be properly supported with reduced XSS risks:

This would imply adopting the code to handle differently text and DOM fragments, and converting DOM fragments to text via textContent when setting input.value and when matching user input. I think (but am not sure) the highlight of matching characters in DOM fragments could still be achieved with the Range API.

What do you think? (I'm leaving aside the backwards-compatibility question for now - I feel it will be easier to consider it once the end goal is clearer)