WICG / sanitizer-api

https://wicg.github.io/sanitizer-api/
Other
228 stars 31 forks source link

New WebIDL syntax with explicit name/namespace #181

Closed evilpie closed 7 months ago

evilpie commented 1 year ago

We have been working on a new syntax that doesn't involve any micro syntax for namespaces, but instead either supports just a string (with an implicit default namespace) or an object/dictionary with an explicit name and namespace.

dictionary ElementWithNamespace {
  required DOMString name;
  required DOMString _namespace;
}
typedef (DOMString or ElementWithNamespace) SanitizerElement;

enum Star {
  "*"
};
typedef (Star or sequence<SanitizerElement>) ElementsOrStar;

dictionary SanitizerAttribute {
  required DOMString name; // We aren't sure how to specify |any| attribute yet.
// maybe: optional bool defaultAttribute;
  DOMString? _namespace = null; // specifying null/undefined means "use the null namespace" 
  required ElementsOrStar elements;
}

dictionary SanitizerConfig {
  sequence<SanitizerElement> allowElements;
  sequence<SanitizerElement> blockElements;
  sequence<SanitizerElement> dropElements;
  sequence<SanitizerAttribute> allowAttributes;
  sequence<SanitizerAttribute> dropAttributes;
  // ....
}

1. Use case: Elements with no namespace (HTML)

{
  allowElements: [ "div", "em", "span", "i"],
}

2. Use case: Mixing elements from different namespaces

{
  allowElements: [ "div", "span", { name: "text", namespace:  "http://www.w3.org/2000/svg" }],
}

3. Use case: Attribute + elements

{
  allowAttributes: [{name: "title", elements: ["i", "b", "s", "u"]}]
}

4. Use case: Allow some attribute on any element (e.g. global attributes)

{
  allowAttributes: [{name: "title", elements: "*"}]
}

5. Use case: Allowing all attributes on a specific element

For this we aren't sure about the syntax.

{
  allowAttributes: [{name: "*", elements: ["span"]}],
  // or
  allowAttributes: [{defaultAttributes: true, elements: ["blah"]}) 
}

6. Use case: Non-null attribute namespace

{
  allowAttributes: [{name: "href", namespace: "http://www.w3.org/1999/xlink", 
     elements: [{ name: "text", namespace: "http://www.w3.org/2000/svg" }]
  }]
}
mozfreddyb commented 1 year ago

CC @annevk

domenic commented 1 year ago

Seems good, but consider using the canonical property names localName and namespaceURI instead of inventing new ones.

annevk commented 1 year ago

I think this is okay given that for MutationRecord we use attributeName and attributeNamespace. I'd even go so far as saying I'd favor sticking to "name" and "namespace" for new APIs, making it clearer that's what element and attribute identity is.


I don't see ElementName defined.

What's the reason for keeping attributes and elements separate?

evilpie commented 1 year ago

I don't see ElementName defined.

ElementName was supposed to be SanitizerElement. Frankly all the names are just something I quickly came up with, we should definitely spend some more time thinking about them before putting them in the spec.

mozfreddyb commented 1 year ago

What's the reason for keeping attributes and elements separate?

A couple of reasons. We think it's making a bit more ergonomic for a couple of use cases:

If elements+attributes were combined in a joint structure, attributes would always have to be copied or specified twice.

otherdaniel commented 1 year ago

I like the new proposal. I'll try to spec it out next week, and maybe start with a trial implementation so I can try it out.

petervanderbeken commented 1 year ago

dictionary ElementWithNamespace {
  required DOMString name;
  required DOMString namespace;

I think this would need to be _namespace (namespace is a terminal symbol), but I also think following Node/Element and using localName and namespaceURI would make more sense.

I would also make the namespace URI nullable. It'd be nice to make it optional, with a default that's the same as what we would use if you're using the DOMString member of SanitizerElement, although I guess that's hard because it would depend on the type of the document?

required DOMString name; // We aren't sure how to specify |any| attribute yet. // maybe: optional bool defaultAttribute;

I think using Star here would make sense. I find having both name and defaultAttribute (allAttributes?) a bit weird, you'd have to add prose about the interaction between them.

mozfreddyb commented 1 year ago

required DOMString name; // We aren't sure how to specify |any| attribute yet. // maybe: optional bool defaultAttribute;

I think using Star here would make sense. I find having both name and defaultAttribute (allAttributes?) a bit weird, you'd have to add prose about the interaction between them.

This is indeed a thing where we were a bit unsure. If I remember correctly, @otherdaniel also preferred a variant of name: '*' and we're not totally against that. I had a personal preference for a named switch, in order to make it more explicit and not raise questions like Does * fall back to elsewhere-allowed stuff? Does it directly fall back to the specified default attributes?

That being said, we're open to feedback from folks with more experience in designing & shipping new WebAPIs. We do not intend to go against the grain if there are existing patterns and are keen to learn more.

@petervanderbeken what's your take on explicit/implicit here? @annevk any opinions about wildcards / explicit flags?

otherdaniel commented 1 year ago

I toyed around a bit with this and think this is a workable proposal. One item I stumbled on is that error handling becomes a bit weird. E.g. if the name in the name/namespace dictionary is required you get weird error handling: Putting some rubbish into the list of elements usually passes (by ignoring said rubbish); but if you put in a dictionary without "name" key in it, then WebIDL can't make sense of it and the Sanitizer construction fails with an exception. That struck me as rather unexpected.

If possible, I'd like some feedback on "web-by" error handling, too. So far, we just ignore un-parseable config items, meaning that in the worst case our safe-by-default defaults will apply. The idea is to not block off future extensions to the config. Should we instead throw on any unknown config items? Or does it depend on the type of item?

I would also make the namespace URI nullable. It'd be nice to make it optional, with a default that's the same as what we would use if you're using the DOMString member of SanitizerElement, although I guess that's hard because it would depend on the type of the document?

I'm in favour. It wouldn't depend on the document type IMHO, but on whether it's an element or attribute. HTML namespace should be the default for elements; the empty namespace for attributes.

This is indeed a thing where we were a bit unsure. If I remember correctly, @otherdaniel also preferred a variant of name: '*' and we're not totally against that.

Yes, I think name: '*' (or to just consider an absent name key to match any name) would be a bit more author friendly. But arguably it's not a big difference.

evilpie commented 1 year ago

I like Peter's idea of using a DOMString or dictionary with name+namespace for the attribute name. However the duplication of name does look a bit strange to me: {allowAttributes: [{name: {name: "id", namespace: null}, elements: "*"}]}

otherdaniel commented 1 year ago

One addition to consider: To support the use case of "grab bag of elements and attributes", it would be nice to specify attributes as a simple list, just like the elements. E.g.:

  allowElements: ["div", "span"],
  allowAttributes: ["class", "style"]

This largely aligns with the existing proposal, but requires defining SanitizerAttribute as (DOMString or ...) , just like SanitizerElement already is. (I wouldn't mind making elements entirely optional so that this would also hold for namespaced attributes. But given how few of those exist that doesn't make much of a difference.)

annevk commented 1 year ago

{allowAttributes: [{name: {name: "id", namespace: null}, elements: "*"}]}

Why not flatten this out as {allowAttributes: [{name: "id", namespace: null, elements: "*"}]}?

evilpie commented 1 year ago

Why not flatten this out as {allowAttributes: [{name: "id", namespace: null, elements: "*"}]}?

Good question. That was actually in my original proposal. I thought that Peter was suggesting we remove the flattening and also use either a DOMString or an object with name + namespace like for elements, but maybe I am just imaging that now. I am open to both solutions.

Aside: My personal opinion is that as long as we can use just a string for what I suspect is the most common use case, the explicit syntax with namespaces doesn't matter much to me. At that point I would even accept name: { localName: "id", namespaceURI: null }.

evilpie commented 1 year ago

We just had a sanitizer meeting and I think we all comfortable with using syntax proposed in the initial https://github.com/WICG/sanitizer-api/issues/181#issue-1451286702. That means:

annevk commented 1 year ago

(If elements defaults to "*" you could allow allowAttributes: ["id"] to globally allow the id attribute, but maybe that's best after some analysis of what people actually do in the wild.)

mozfreddyb commented 1 year ago

We discussed this again in the call and agree on this syntax. As a side note, @annevk mentioned we can future-proof the syntax for future improvements by using element-name validation that already exists in createElement

benbucksch commented 1 year ago

@annevk wrote:

If elements defaults to "*", you could allow allowAttributes: ["id"] to globally allow the id attribute

+1

evilpie commented 1 year ago

Updated WebIDL based on what was discussed in #199. This doesn't quite match #196, because that seems to contain the star "*" syntax again, which AFAIK didn't really go into.

dictionary SanitizerElementNamespace {
  required DOMString name;
  DOMString? _namespace = "http://www.w3.org/1999/xhtml";
};

// Used by "elements"
dictionary SanitizerElementNamespaceWithAttributes : SanitizerElementNamespace {
  sequence<SanitizerAttribute> attributes;
  sequence<SanitizerAttribute> removeAttributes;
};

typedef (DOMString or SanitizerElementNamespace) SanitizerElement;
typedef (DOMString or SanitizerElementNamespaceWithAttributes) SanitizerElementWithAttributes;

dictionary SanitizerAttributeNamespace {
  required DOMString name;
  DOMString? _namespace = null;
};
typedef (DOMString or SanitizerAttributeNamespace) SanitizerAttribute;

dictionary SanitizerConfig {
  sequence<SanitizerElementWithAttributes> elements;
  sequence<SanitizerElement> removeElements;
  sequence<SanitizerElement> replaceWithChildrenElements;

  sequence<SanitizerAttribute> attributes;
  sequence<SanitizerAttribute> removeAttributes;

  boolean customElements;
  boolean unknownMarkup; // Name TBD!
  boolean comments;
};

Edit: 16 October. Removed Sanitizer SanitizerAttributeNamespaceWithElements. Removed allow prefixes. DOMString? element namespace. 15 November: #200 resolved, s/flattenElements/replaceWithChildrenElements/.

annevk commented 1 year ago

Thank you @evilpie for putting that together!

  1. namespace should probably always be DOMString? even though that is silly, for compatibility with namespaceURI.
  2. I thought we had decided in the meeting that SanitizerAttributeNamespaceWithElements wasn't needed for the MVP as it was just syntactic sugar.
  3. I think the remaining allow* members should drop the allow prefix.
  4. Not sure we have precedent for "markup" in APIs. unknownNodes or unknownElementsAndAttributes might be cleaner?
evilpie commented 1 year ago
  1. namespace should probably always be DOMString? even though that is silly, for compatibility with namespaceURI.

So what would happen for null? Is it just treated as the HTML namespace, so the same as undefined? AFAIK elements always have a namespaceURI.

  1. I thought we had decided in the meeting that SanitizerAttributeNamespaceWithElements wasn't needed for the MVP as it was just syntactic sugar.

I forget about that... I just looked at our notes and we did decide that! I've updated the IDL.

  1. I think the remaining allow* members should drop the allow prefix.

Makes sense to me. I just kept that.

  1. Not sure we have precedent for "markup" in APIs. unknownNodes or unknownElementsAndAttributes might be cleaner?

unknownElementsAndAttributes is quite the mouthful. Maybe we could quickly discuss that during our next Sanitizer meeting.

evilpie commented 1 year ago

Anne pointed out that it's apparently possible to create elements with a null namespace in XML or explicitly via the API document.createElementNS. We should probably allow it for future compatibility.

annevk commented 1 year ago

unknown* also depends on #188.