Open bep opened 5 years ago
I note that html/template does not import the encoding/xml package https://godoc.org/html/template?imports because html5 is not xml.
because html5 is not xml.
Note that XML is not mentioned in my bug report. According to W3C
:
HTML is an application conforming to International Standard ISO 8879 -- Standard Generalized Markup Language (SGML).
https://en.m.wikipedia.org/wiki/Standard_Generalized_Markup_Language
HTML was theoretically an example of an SGML-based language until HTML 5, which browsers cannot parse as SGML for compatibility reasons.
HTML5 is not SGML either.
HTML5 is not SGML either.
HTML 5 isn't mentioned in the html/template
documentation. Are you suggesting that this issue should not be fixed?
I think it is safe to say that most browser implementations support SGML processing instructions, even if they don't use them. And since most browsers/parsers (except Go's) ignore them, they can be really useful.
/cc @mikesamuel for owners input
I think it is safe to say that most browser implementations support SGML processing instructions
Per HTML5 rules, <?PITarget PIContent?>
is treated as a bogus comment outside a foreign-content context.
After seeing <
, the lexer is in the tag open state which says
U+003F QUESTION MARK (?)
This is an unexpected-question-mark-instead-of-tag-name parse error. Create a comment token whose data is the empty string. Reconsume in the bogus comment state.
which then appends content to the first >
to the created comment node.
So HTML 5 may support PIs but using node type 8 (comment) not 7 (processing instruction). This can be seen by running:
let div = document.createElement('div');
div.innerHTML = '<?PI PI?>';
div.firstChild.nodeType === Node.COMMENT_NODE
// -> true
In contrast to the XML grammar, in HTML, <?PITarget >
is a complete token.
Other constructs also enter the bogus comment state: <!foo>
It seems there are a few options:
Mentioned merely for completeness.
<?...>
as a commenthtml/template elides comments so to be consistent we would drop them from the output.
The best way to move that forward might be to open an issue to collect use cases for preserving PIs in templates, and to flesh out a definition of the desired hybrid grammar.
As you point out, outputting <?PI...
is already buggy, so we probably needn't worry overly about backwards compatibility, but I can't speak for the Go maintainers.
@kardianos
I note that html/template does not import the encoding/xml package
Fyi, html/template uses its own hand-written parser that does not assume that a template is a complete document fragment.
Open and close tags need not appear in the same template:
{{define "foo"}}
<div>{{template "bar"}}
{{end}}
{{define "bar"}}
</div>
{{end}}
And a template may not be in the same language as its caller:
{{define "html"}}
<script>{{template "js"}}</script>
{{end}
{{define "js"}}
alert('There are no <tags> in this <template>');
{{end}
@mikesamuel Yes, sorry, that was indeed my point.
html/template elides comments so to be consistent we would drop them from the output.
Please don't do that. Filtering out comments was probably also a bad idea originally, but I'm not going down that road. In contrast to comments, which I guess is mostly used to inform/remind people about a certain implementation detail, processing instructions have meanings to computers -- removing them would be a breaking thing.
Per HTML5 rules,
<?PITarget PIContent?>
is treated as a bogus comment outside a foreign-content context.
If html/templates
is strictly HTML 5 that should be documented.
@bep
If you want html/template to preserve things that look kind of like processing instructions, please make the case that that is worthwhile. This talk about processing instructions in HTML seems to be appealing to a standard that does not exist (see below).
You seem to be suggesting that using comments that look like PIs is a widespread practice; I have not encountered that in my work, which is why I asked for use cases. Please convince me that this is something developers need. I will need to see some numbers though.
If you do want <?...?>
in output, you can do that in html/template today, the same way you can output a regular comment: https://play.golang.org/p/eOAB0UyYbYy . If that is bad ergonomically, please explain how and what template interpolations like <?PI {{.}}?>
would mean.
Filtering out comments was probably also a bad idea originally
If you want to demonstrate this, please compare the cost of leaks due to developer comments unintentionally persisting in code to the cost of having to do something like the playground above for the few times you actually need a comment. To the best of my knowledge, the only repeatedly occurring use case for shipping comments in code is MS conditional compilation comments which were deprecated in 2015.
""" Important As of Internet Explorer 10, conditional comments are no longer supported by standards mode. """
I know of no use case for shipping comments to modern browsers that is not met by the playground above. Again, please convince me otherwise.
If
html/templates
is strictly HTML 5 that should be documented.
The documentation could be more specific, but I'm not just saying processing instructions are not part of HTML 5. I refer to HTML 5 because that's what modern browsers approximate. To my knowledge no version of HTML has ever supported processing instructions.
HTML 4 B3.3 SGML features with limited support says
"""
SGML systems conforming to [ISO8879] are expected to recognize a number of features that aren't widely supported by HTML user agents. We recommend that authors avoid using all of these features.
...
...
Authors should be aware that many user agents render processing instructions as part of the document's text. """
Browsers do not and have not recognized processing instructions in HTML. What changed between HTML 4 and HTML 5 was how they chose not to parse processing instructions as Node.PROCESSING_INSTRUCTION
.
If you want youhtml/template to preserve things that look kind of like processing instructions, please make the case that that is worthwhile.
The html/template
package is there mainly for security concerns, not to make opinionated adjustments to people's applications.
So:
I want the HTML
parser to preserve anything that isn't a security concern. There are obvious use cases for using SGML processing instructions in HTML. I have a few of them myself, which is why I created this issue. I don't see why I need to detail them in this topic. The current behavior obviously needs to be fixed. And I cannot see how you can fix it by removing these processing instructions, as that would make a broken solution even more broken. So I suggest you do as very other HTML parsers on the planet: Leave them alone.
I will leave this thread now. Do as you please, but please consider my concerns above.
Browsers do not and have not recognized processing instructions in HTML. What changed between HTML 4 and HTML 5 was how they chose not to parse processing instructions as
Node.PROCESSING_INSTRUCTION
.
I think you need to look at Go's HTML
handling as very, very standard/basic and version agnostic. HTML 5 was released in 2014, Go 1.0 long, long before that.
@bep
There are obvious use cases for using SGML processing instructions in HTML.
Great. I look forward to seeing them.
Browsers do not and have not recognized processing instructions in HTML. ...
I think you need to look at Go's
HTML
handling as very, very standard/basic and version agnostic. HTML 5 was released in 2014, Go 1.0 long, long before that.
How is Go's HTML handling relevant to what the HTML standard is?
Is it valid to use html/template
with XML content? If so, stripping or escaping <?xml version="1.0" encoding="utf-8"?>
is undesirable.
My understanding is that this type of processing instruction is also sometimes used in HTML. e.g. <?php echo "Hello, World!"; ?>
.
Go version
go1.12beta2
.The above prints
<?PITarget PIContent?>
.I would expect it to be left untouched, e.g.
<?PITarget PIContent?>
.https://en.wikipedia.org/wiki/Processing_Instruction