Closed zufuliu closed 2 months ago
There is also <%
in XML CDATA causing issues.
Above change only fix the bug for XML (SCLEX_XML
), not client HTML (SCLEX_HTML
and SCLEX_PHPSCRIPT
). I think we can add two properties to fix the bug for HTML:
lexer.html.php_tag
(default on, only used for SCLEX_HTML
), when enabled will interpret <?
.lexer.html.asp_tag
(default on, not used for SCLEX_XML
), when enabled will interpret <%
. ASP tag was removed from PHP 7, see https://wiki.php.net/rfc/remove_alternative_php_tags and https://wiki.php.net/rfc/deprecate_php_short_tags@zufuliu Only interpret <? inside CDATA section for SCLEX_PHPSCRIPT
A quick search for PHP examples shows PHP embedded in web pages, not stand-alone PHP (SCLEX_PHPSCRIPT
) which is the unusual case. For web pages, it is a text preprocessor over HTML so takes precedence for SCLEX_HTML
. It is less commonly used for XML but may be used in the same way to produce data so is active for SCLEX_XML
.
@zufuliu Don't interpret <? inside XML CDATA section (original bug), let SCLEX_HTML as is.
PHP generated XML is also useful.
There could be options to turn off pre-processors.
There could be options to turn off pre-processors.
I still think these options should default off for XML, as using PHP (or server-side technology) to preprocess XML is really rare (MIME configurations), otherwise most if not all use of SCLEX_XML
will need to turn them off.
I still think these options should default off for XML
It is not reasonable to break current applications.
Please try again.
lexer.html.php_tag
(default on, only used forSCLEX_HTML
), when enabled will interpret<?
.
This is complex than I thought, <?
in HTML is treated as comment by browser, see https://html.spec.whatwg.org/multipage/parsing.html#parse-error-unexpected-question-mark-instead-of-tag-name
There could be options to turn off pre-processors.
Above patch only fixed CDATA (original bug), I think we can add options to turn off pre-processors globally:
<?
can be PHP start tag, XML instruction or error. <%
is ASP (or template) start tag or error.<?
can be PHP start tag or comment start. <%
is ASP (or template) start tag or element text.lexer.html.cdata.tag
/ lexer.xml.cdata.tag
is confusing to me: its treating the file as ASP/PHP apart from a small exception of CDATA. ASP and PHP may have some understanding of the HTML/XML elements and so may special-case CDATA but that should be checked in the ASP and PHP documentation or with examples.
However, the motivating example in notepad-plus-plus/notepad-plus-plus#14576 does not appear to be intended for ASP or PHP processing and would be better treated as basic unprocessed XML.
Perhaps the desire here is to avoid splitting file types into with/without preprocessing so the user doesn't have to deal with this choice.
PHP does not understand HTML/XML elements: https://www.php.net/manual/en/language.basic-syntax.phptags.php
When PHP parses a file, it looks for opening and closing tags, which are
<?php
and?>
which tell PHP to start and stop interpreting the code between them. Parsing in this manner allows PHP to be embedded in all sorts of different documents, as everything outside of a pair of opening and closing tags is ignored by the PHP parser.
ASP may (not test) understand HTML elements as it need to parse asp:
prefixed tag and runas
attribute on HTML element.
PHP control could have 3 states since <?php
is much less likely to mean something different than <?
although it does in this case.
<?php
enabled<?php
and <?
enabledFor ASP, the character just after <%
could be checked as that decides between different ASP modes. From https://learn.microsoft.com/en-us/troubleshoot/developer/webapps/aspnet/development/inline-expressions it is unclear whether whitespace is required for embedded code blocks. There are other languages like JSP (and derivatives) that also use <%
and currently work because of that similarity so could affect whether this is reasonable.
Here is a patch that implements lexer.xml.allow.php
and lexer.html.allow.php
with three states:
Set to 0 to disable PHP in HTML, 1 to accept <?php, 2 to also accept <?. The default is 2.
if (allowPHP == AllowPHP::PHP)
should also allow short echo tag: <?= ?>
.
https://www.php.net/manual/en/language.basic-syntax.phptags.php
As short tags can be disabled it is recommended to only use the normal tags (
<?php ?>
and<?= ?>
) to maximise compatibility.
@nyamatongwe For ASP, the character just after <% could be checked ... it is unclear whether whitespace is required for embedded code blocks.
Whitespace is not required and ASP examples with '<%` followed immediately by a call is common.
Needs update segIsScriptingIndicator()
for following code:
https://github.com/ScintillaOrg/lexilla/blob/a6f1998ac753fcb6faf7d66773fde1f3b513934e/lexers/LexHTML.cxx#L104-L105
PHP script tag was removed in PHP 7, https://wiki.php.net/rfc/remove_alternative_php_tags
This RFC proposes the removal of ASP tags (
<%
) and script tags (<script language=php>
) as a means of entering or leaving PHP mode.
Created for https://sourceforge.net/p/scintilla/bugs/1078/.
Per https://html.spec.whatwg.org/multipage/syntax.html#cdata-sections and https://www.w3.org/TR/xml11/#sec-cdata-sect, CDATA section is a literal block and only the closing
]]>
is recognized. I think here are two fixes:<?
inside CDATA section forSCLEX_PHPSCRIPT
.<?
inside XML CDATA section (original bug), letSCLEX_HTML
as is.