OOB does not escape query selector

bigskysoftware / htmx

</> htmx - high power tools for HTML

https://htmx.org

Other

37.81k stars 1.28k forks source link

OOB does not escape query selector #1537

Open eduardvercaemer opened 1 year ago

eduardvercaemer commented 1 year ago

I have a very simple HTML with some elements such as

<div id="foo-/bar/">

I want to execute an OOB swap, but sending the following HTML

<div id="foo-/bar/" hx-swap-oob="true">

results in the following error in the console

Failed to execute 'querySelectorAll' on 'Document': '#foo-/bar/' is not a valid selector.

The query selector can be escaped like '#foo-\/bar\/' and should be done automatically by htmx.

alexpetros commented 1 year ago

Is that a valid id?

matiboy commented 1 year ago

@alexpetros According to MDN, it is:

Technically, the value for an id attribute may contain any character, except whitespace characters.

The documentation for querySelector actually mentions escaping special characters though it surprisingly does not mention using CSS.escape which would be the easiest way to handle the above.

I'll try and create a PR for this.

alexpetros commented 1 year ago

I still think this is a reasonable request if you want to write your own escape function or just wait for htmx 2.

danielo515 commented 1 year ago

what about using getElementByID when it is detected to be an ID? that method is able to work with any id string

Encephala commented 10 months ago

This problem doesn't just apply to hx-swap-oob, it applies to any selector HTMX uses. The following snippet yields an error similar to the one in this issue:

<button hx-get="/clicked" hx-swap="beforeend" hx-target="#target!">Click me</button>
<div id="target!"></div>

That makes the solution more complicated, as we can't simply apply some escape function inside f.i. normalizeSelector, as characters like , have meaning in attributes outside of CSS selectors.

I implemented a simple escape function to test this (here) and found many failing tests when also applying this escape function inside normalizeSelector. Specifically, tests for hx-include, hx-indicator and hx-disabled-elt that separate multiple selectors with a , fail (and a test which uses hx-indicator=".a1").

I don't quite get where the code that processes those attributes calls normalizeSelector, hopefully someone that better understands the project can help out.

Either way, in my commit above I implemented a simple escape function that fixes the issue for hx-swap-oob (I also yoinked the test from #1864 by @imankulov). I guess that can get merged to close this issue, although I don't know if the list of escaped characters is exhaustive enough/if we should yoink this polyfill (also the escapeSelector function is at a weird place in the file because I was trying to get it working on normalizeSelector as well).

svenberkvens commented 10 months ago

what about using getElementByID when it is detected to be an ID? that method is able to work with any id string

Yes, this! The OOB swapping function uses querySelectorAll() to get a list of targets that it forces to be an ID by placing a # in front of whatever is present in the hx-oob-swap attribute. You might as well skip all the hassle of selectors and escaping by using getElementById to get the one and only element that matches the given target. By definition, IDs are unique and cannot occur more than once in the document.

Other attributes target elements by specifying an actual selector, not an ID like hx-oob-swap uses. I think it's fine to require the programmer to escape weird characters in the selector in those non-hx-oob-swap HTML attributes themselves. Otherwise, it wouldn't be an actual selector in the first place!

FraserChapman commented 7 months ago

I was tempted to have a bash at this, but just wanted to clarify/confuse(!?) the issue.

...By definition, IDs are unique and cannot occur more than once in the document.

This is not entirely accurate. While it's true that, in practical terms, each id value within a single HTML document should be unique across the entire document, the uniqueness requirement is enforced within specific subtrees of the document.

That is, each id must be unique within its own subtree.

This concept is outlined in the HTML5 specification [1][2], where the "home subtree" of an element is defined as the portion of the DOM tree rooted at the element's root element. Therefore, within different subtrees of the document, elements can have the same id values.

To illustrate this point, consider the markup of websites like YouTube, where multiple elements with the same id values are present. These elements exist within different subtrees of the document, hence fulfilling the uniqueness requirement within their respective subtrees.

For a practical demonstration, you can visit a YouTube video page (e.g., https://www.youtube.com/watch?v=LriHRa9t1fQ) and open the browser console.

Running document.getElementById("video-title") will retrieve the first element with the specified id, while document.querySelectorAll('[id="video-title"]') will return a NodeList containing multiple elements with the same id value, demonstrating the concept of unique id values within different subtrees.

This works because subtrees form their own scope, the best example of this being the "Shadow DOM" which is an adjunct tree of DOM nodes. Thus any such subtree can contain ID that overlap with IDs in the rest of the document, without the the IDs in the subtree clashing with those in the document.

Or in a simpler contrived example

The practical upshot of this in regards to this issue is that you can do - document.querySelectorAll('[id="foo-/bar/"]') to get the nodelist of all elements with the id "foo-/bar/" - without getting Failed to execute 'querySelectorAll' on 'Document': 'foo-/bar/' is not a valid selector. and whilst not limiting the selection to the first matching id in the document, and whilst not having to escape the id with CSS.escape or some custom escaping function.

References:

[1] HTML5 Specification: The ID Attribute (https://www.w3.org/TR/2011/WD-html5-20110525/elements.html#the-id-attribute)

[2] HTML5 Specification: Home Subtree (https://www.w3.org/TR/2011/WD-html5-20110525/infrastructure.html#home-subtree)

FraserChapman commented 7 months ago

Have popped a PR in to address this based on my previous comment - https://github.com/bigskysoftware/htmx/pull/2319

Atmos4 commented 7 months ago

@FraserChapman your explanation of home subtree is incorrect. With any node in the same Document, the home subtree with be that Document.

In your example, it means all of those IDs' home subtrees are the same: window.document. Hence your example doesn't respect w3.

Note: as you said, using an ID several times is however valid with each ID in a different subtree like different Shadow DOMs. However, you seem to misunderstand the concept of subtree: with a single ID per document, any <documentRoot>.querySelectorAll("[id=<someId>]") should always return a single element. Your example fails to demonstrate that, so your whole point about querySelectorAll is also incorrect.

To illustrate this point, consider the markup of websites like YouTube

Taking Youtube (or any production website) as an example to prove a point regarding browser standards is generally a bad idea.

Unfortunately browsers won't mark this as an issue unless your duplicate IDs are on inputs, which usually breaks autofill functionalities.

The initial problem

@eduardvercaemer It is a very bad idea to use any random character in IDs: it will break in certain browsers, document fragment links will not be well-supported, and so on. For the sake of sanity checking, my honest opinion would be that the maintainers look at the issue again and reflect about why the solution for your problem would not simply be to refrain from using weird characters in the ID 😅

@matiboy if you had linked the entire MDN paragraph, it would have simplified the discussion:

Technically, the value for an id attribute may contain any character, except whitespace characters. However, to avoid inadvertent errors, only ASCII letters, digits, '_', and '-' should be used, and the value for an id attribute should start with a letter.

FraserChapman commented 7 months ago

@Atmos4i to clarify, that's not my explanation, that is the wording of the specification.

the "home subtree" of an element is defined as the portion of the DOM tree rooted at the element's root element.

The example I provided was, as I stated, contrived to illustrate the discussion, it was not meant to be a complaint or respect w3.

Further the YouTube example was to demonstrate the use of subtrees "in the wild" - where multiple ids with the same value exist, it wasn't to "prove a point about browser standards".

My understanding of subtrees was

This works because subtrees form their own scope, the best example of this being the "Shadow DOM" which is an adjunct tree of DOM nodes. Thus any such subtree can contain ID that overlap with IDs in the rest of the document, without the the IDs in the subtree clashing with those in the document.

Could you point out where my failure of understanding is? It's entirely posible I'm misunderstanding something, but I'm not clear what you are referring to here.

Leaving aside the debate around multiple ids with the same value, my main point about querySelectorAll with regard to this issue was that it can be used to select the correct element without the need for escaping (the entire basis of this discussion)...so saying my whole point about querySelectorAll being incorrect, is ironically, itself incorrect :)

The further point I was making about multiple ids with the same value was that it would work on sites "in the wild" where document.querySelectorAll("some-id") does in fact return a NodeList of multiple elements.

Apologies if I confused any of this, as I said it's entirely possible I'm misunderstanding certain aspects of this :)

I do see you point about not supporting certain characters as an easy "solution" - but to a certain extent doesn't this amount to "we don't support certain valid IDs because it's too difficult"?

Atmos4 commented 7 months ago

Let me try to explain:

The term subtree is related to the ShadowRoot API and Element.attachShadow. Those are linked to the wider topic of Web Components, but they can be used on their own.

document.getElementById is bound to the subtree window.document. If you however create a shadowRoot, it will be the root for another subtree. Any ID within that subtree won't be reachable by queries on document, would it be document.getElementById or document.querySelectorAll.

I made a quick Codepen example to demonstrate this.

The example I provided was, as I stated, contrived to illustrate the discussion

The example you provided is only involving document, so it fails to illustrate the concept of subtree. The codepen above is a valid example where you have an ID only once per subtree.

Further the YouTube example was to demonstrate the use of subtrees "in the wild"

The Youtube website is the exact same as your example. document.querySelectorAll would never return IDs from other subtrees, because that is the very purpose of a subtree: isolation from DOM queries.

doesn't this amount to "we don't support certain valid IDs because it's too difficult"?

Adding support for this is very simple, like you show in your PR: [id="<some_id>"] will work with any ID (even with whitespace characters). (see this other Codepen). A better justification would be "we don't support special characters because it is a bad idea and it is an unmaintainable pattern".

I hope I did not sound too off-putting. I am merely trying to:

address the spread of incorrect information regarding subtrees, IDs and w3 standards.
sanity-check the discussion around supporting unrecommended ID patterns.

FraserChapman commented 7 months ago

@Atmos4

The example you provided is only involving document, so it fails to illustrate the concept of subtree.

Ok I think I see the misunderstanding, my simple contrived example really wasn't being used to illustrate the concept of a subtree - it was being used to illustrate how querySelectorAll does, in fact, return a node list of multiple elements if the elements have the same id. That is all. I apologise if I have confused the discussion here.

Granted I misunderstood youtube was ultimately the same thing, I thought this was doing something different. So again apologies if this confused things.

If all valid IDs are to be supported, I suppose the wider issue would be when multiple ids are supplied such as "#foo #bar/baz #bat!" - with regards to hx-target, etc - here being able to do...

document.querySelectorAll('[id="foo"],[id="bar/baz"],[id="bat"]')

Has to be better than doing...

document.getElementById('foo') document.getElementById('bar/baz') document.getElementById('bat')

And then aggregating the results of the separate calls.

hx-target has the same issue as the oob (not escaped) but accepts multiple ids.

I do actually have a working branch that implements this for hx-target so that any set of valid IDs works, and robustly tests it - I just was going to pop it into a separate PR to keep this one simple.

Also not off-putting at all, I think we are all trying to best understand, clarify and find reasonable solutions :)

With that in mind, I hope I can say, without sounding condescending, I do still find the proposed...

"we don't support special characters because it is a bad idea and it is an unmaintainable pattern"

...a bit of a fudge.

My point being that ultimately the IDs are valid according to the specification, and it is maintainable with a bit of thought and care. However this is obviously a debatable point!

I'll post a PR that illustrates the wider point about hx-target and the other selectors HTMX uses later today, I'm actually travelling ATM and only have my phone to hand.

FraserChapman commented 7 months ago

To follow up to this, re the comment from @Encephala - https://github.com/bigskysoftware/htmx/issues/1537#issuecomment-1834679051

This problem doesn't just apply to hx-swap-oob, it applies to any selector HTMX uses. The following snippet yields an error similar to the one in this issue: <button hx-get="/clicked" hx-swap="beforeend" hx-target="#target!">Click me</button> <div id="target!"></div>

My idea was that one could extend the querySelectorAllExt function so that there is a penultimate conditional check that does something like;

If the raw selector string contains at least one id (i.e. # appears in any part of the string).
Try and split the selector string on comma (i.e check for possible multiple selectors)
normalize each selector present in the raw selector string
any normalized selector that starts with # uses the "id=" format without the # prefix
other non-id normalized selectors are returned as is

In code this would look something like...

...
} else if (selector.indexOf("#") !== -1) {     
    var formattedSelectors = selector.split(",").map(function (s) {
        var normalized = normalizeSelector(s);
        return normalized.indexOf("#") === 0 ?
            '[id="' + normalized.substr(1) + '"]' :
            normalized;
    }).join(",");
    return getDocument().querySelectorAll(formattedSelectors);                
} else {
    return getDocument().querySelectorAll(normalizeSelector(selector));
}

Then given a range of possible "selector" strings, the "formattedSelectors" results would be...

selector	formattedSelectors
"#s1"	'[id="s1"]'
"#i1, #i2"	'[id="i1"],[id="i2"]'
"#foo-/bar/, #baz-/bat/"	'[id="foo-/bar/"],[id="baz-/bat/"]'
"<#d1/>"	'[id="d1"]'
"#i1, #i2, #f1"	'[id="i1"],[id="i2"],[id="f1"]'

etc...

Thus allowing for a single call to querySelectorAll (rather than multiple calls to getElementById, custom escaping, or just ignoring the issue) and accepting valid ids with special characters.

However based on based @Atmos4 various comments it looks like this might just have all been a waste of time... If the project has no desire to support all valid IDs because they are a "bad idea" - which is a valid position to take - then that is fair enough.

If that is the case possibly remove the "help wanted" tag and close this issue to save any other well meaning people wasting their time trying to implement a solution for valid IDs that aren't going to be supported in any circumstance.

Thanks for the responses and clarifications, and again my apologies for any confusion I introduced in the discussion. I was genuinely just trying to help find a solution to the root issue raised and the wider issue with hx-target and the other selectors that HTMX uses :)

Atmos4 commented 7 months ago

it looks like this might just have all been a waste of time

Nothing is ever a waste of time 🙂 discussing a subject and fool-proofing ideas is not only super valuable, I would argue it is a building block of OSS. So keep doing what you're doing, because you are doing great.

However, sometimes there is a need to take a step back. For example here:

I believe hx-target and hx-swap-oob are only supposed to target one element, so having a list of elements here might not make a lot of sense.
it feels like the issue is supposed to fix hx-swap-oob. Maybe we can create a separate issue for hx-target?

Regarding the argument with IDs: I was just surprised that MDN recommendations regarding ID format were not fully mentioned, and I was also trying to fool-proof that big message about home subtree isolation which was a bit wrong.

But about the topic of the issue, I'd say the PR you mentioned makes sense 👍

FraserChapman commented 7 months ago

@Atmos4 Yes, the root issue was hx-swap-oob - the hx-target, etc, was from a follow on comment in this thread, if indeed it is only ever a single ID then granted that doesn't make a lot of sense. However the same issue with hx-swap-oob exists with pretty much every selector HTMX uses that can be, or contain, ID values.

FWIW I'd also been playing with hx-include - that does, at least as I recall, allow for multiple IDs to be supplied. e.g.

<div hx-include="#foo-/bar/, #baz-/bat/, #c"></div>

Which would be resolved by the querySelectorAllExt code I mentioned - i.e. it would generate something like...

'[id="foo-/bar/"],[id="baz-/bat/"],[id="c"]'

However I should probably just wait until I'm at my computer - as I'm writing this all from memory, on my phone, whilst on a train :/