Closed zufuliu closed 1 month ago
It's probably worth improving out-of-range behaviour to be more reasonable. That makes it easier to treat end of document the same as other positions. For the string-returning GetRange
and GetRangeLowered
, restricting the end to the end of the document should be OK.
std::string LexAccessor::GetRangeLowered(Sci_PositionU startPos_, Sci_PositionU endPos_) {
const Sci_PositionU endRange = std::min(endPos_, static_cast<Sci_PositionU>(lenDoc));
assert(startPos_ < endRange);
const Sci_PositionU len = endRange - startPos_;
std::string s(len, '\0');
GetRangeLowered(startPos_, endRange, s.data(), len + 1);
return s;
}
For the char-buffer writing versions, filling the array with NUL
then retrieving as much as possible may be OK.
or any XML processing instruction with
xml
prefix, so removedIsASpace()
.
from https://www.w3.org/TR/xml11/#sec-pi, xml prefixed processing instructions are reserved.
The target names "XML", "xml", and so on are reserved for standardization in this or future versions of this specification.
Currently here is <?xml-stylesheet ?>
, https://www.w3.org/TR/xml-stylesheet/
<?xml-stylesheet href="common.css"?>
<?xml-stylesheet href="default.css" title="Default style"?>
<?xml-stylesheet alternate="yes" href="alt.css" title="Alternative style"?>
<?xml-stylesheet href="single-col.css" media="all and (max-width: 30em)"?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Example with xml-stylesheet processing instructions</title>
</head>
<body>
...
</body>
</html>
I think it (and other xml prefixed instructions) should be handled same as <?xml version="1.0" encoding="utf-8"?>
, so IsASpace()
can be removed.
For the char-buffer writing versions, filling the array with NUL then retrieving as much as possible may be OK.
changes like following? it will do some cheap redundant works already done for string-returning versions.
@@ -32,7 +32,9 @@ bool LexAccessor::MatchIgnoreCase(Sci_Position pos, const char *s) {
void LexAccessor::GetRange(Sci_PositionU startPos_, Sci_PositionU endPos_, char *s, Sci_PositionU len) {
assert(s);
assert(startPos_ <= endPos_ && len != 0);
+ memset(s, '\0', len);
endPos_ = std::min(endPos_, startPos_ + len - 1);
+ endPos_ = std::min(endPos_, static_cast<Sci_PositionU>(lenDoc));
len = endPos_ - startPos_;
if (startPos_ >= static_cast<Sci_PositionU>(startPos) && endPos_ <= static_cast<Sci_PositionU>(endPos)) {
const char * const p = buf + (startPos_ - startPos);
@@ -40,7 +42,6 @@ void LexAccessor::GetRange(Sci_PositionU startPos_, Sci_PositionU endPos_, char
} else {
pAccess->GetCharRange(s, startPos_, len);
}
- s[len] = '\0';
}
void LexAccessor::GetRangeLowered(Sci_PositionU startPos_, Sci_PositionU endPos_, char *s, Sci_PositionU len) {
Changed all the four function to const
and truncate endPos_
to lenDoc
.
Not going to change segIsScriptingIndicator()
, as here is no test for space before xml
(if (!IsASpace(s[t]))
block is not reachable in all existing tests). though (Contains(s, "xml")
block can be optimized to avoid second find()
. PrintScriptingIndicatorOffset-0828.patch
changes for PrintScriptingIndicatorOffset()
is safe and simpler than origin code.
StyleContext::GetCurrent()
can also be marked as const
.
(Contains(s, "xml")
block can be optimized to avoid secondfind()
if (Contains(s, "php"))
return eScriptPHP;
{
const size_t xml = s.find("xml");
if (xml != std::string::npos) {
for (size_t t = 0; t < xml; t++) {
if (!IsASpace(s[t])) {
return prevValue;
}
}
return eScriptXML;
}
}
@@ -103,7 +103,7 @@ script_type segIsScriptingIndicator(const Accessor &styler, Sci_PositionU start,
return eScriptJS;
if (Contains(s, "php"))
return eScriptPHP;
- if (Contains(s, "xml")) {
+ {
const size_t xml = s.find("xml");
if (xml != std::string::npos) {
for (size_t t = 0; t < xml; t++) {
@@ -111,8 +111,8 @@ script_type segIsScriptingIndicator(const Accessor &styler, Sci_PositionU start,
return prevValue;
}
}
+ return eScriptXML;
}
- return eScriptXML;
}
return prevValue;
it seems better to move
Contains(s, "php")
andContains(s, "xml")
cases to a new function, e.g.:
Something like following, not sure whether worth the duplication (keep segIsScriptingIndicator()
unchanged).
@@ -117,6 +117,16 @@ script_type segIsScriptingIndicator(const Accessor &styler, Sci_PositionU start,
return prevValue;
}
+script_type segIsScriptInstruction(Accessor &styler, Sci_PositionU start, bool isXml) {
+ if (styler.MatchIgnoreCase(start, "php")) {
+ return eScriptPHP;
+ }
+ if (isXml || styler.MatchIgnoreCase(start, "xml")) {
+ return eScriptXML;
+ }
+ return eScriptPHP;
+}
+
int PrintScriptingIndicatorOffset(Accessor &styler, Sci_PositionU start) {
return styler.MatchIgnoreCase(start, "php") ? 3 : 0;
}
@@ -1492,7 +1502,7 @@ void SCI_METHOD LexerHTML::Lex(Sci_PositionU startPos, Sci_Position length, int
// handle the start of PHP pre-processor = Non-HTML
else if ((ch == '<') && (chNext == '?') && IsPHPEntryState(state) && IsPHPStart(allowPHP, styler, i)) {
beforeLanguage = scriptLanguage;
- scriptLanguage = segIsScriptingIndicator(styler, i + 2, i + 6, isXml ? eScriptXML : eScriptPHP);
+ scriptLanguage = segIsScriptInstruction(styler, i + 2, isXml);
if ((scriptLanguage != eScriptPHP) && (isStringState(state) || (state==SCE_H_COMMENT))) continue;
styler.ColourTo(i - 1, StateToPrint);
beforePreProc = state;
if ((scriptLanguage != eScriptPHP) && (isStringState(state) || (state==SCE_H_COMMENT))) continue;
needs extra fix for <?xml
or non-preprocessor inside string or comment.
Patch updated "handle the start of PHP pre-processor = Non-HTML":
IsPHPStart()
, segIsScriptInstruction()
and PrintScriptingIndicatorOffset()
into single function to avoid repeated GetRangeLowered()
or MatchIgnoreCase()
.<?=
as whole instead of <?
+ operator (similar to ASP <%=
).eScriptXML
to AnyOf(state, SCE_H_DEFAULT, SCE_H_SGML_BLOCK_DEFAULT)
, to avoid change scriptLanguage
and beforeLanguage
when <?xml
not starts XML.Renamed enumeration ScriptInstruction
to InstructionTag
.
https://www.php.net/manual/en/language.basic-syntax.phptags.php
When PHP parses a file, it looks for opening and closing tags, which are <?php and ?> which tell PHP to start and stop interpreting the code between them. Parsing in this manner allows PHP to be embedded in all sorts of different documents, as everything outside of a pair of opening and closing tags is ignored by the PHP parser.
PHP includes a short echo tag <?= which is a short-hand to the more verbose <?php echo.
As PHP code is only valid inside tags, the code set initStyle = SCE_HPHP_DEFAULT;
can be removed. IsPHPEntryState()
can be simplified to exclude all PHP styles, the following nonsense code will be interpreted by PHP:
<!--<?=1?>-->
<script>
/*<?=2?>*/
</script>
The phpscript
mode was introduced in March-April 2005 with contributions from Jan Martin Pettersen and Iago Rubio. Jan described it as
It depends on how you look at it, it's quite new in the pure scripting manner, as it came in one of the newest 4.xx versions of PHP (on the windows platform, not sure when it came on linux), but the language is exactly the same as the normal PHP, except it uses expects pure PHP, instead of html/php combined, and as such, doesn't use the <?php ?> tag either..
Trying to search for "phpscript" or quickly read PHP 4.x release notes doesn't find much of interest so I suspect it disappeared or has another name.
HTML-PHP-0930.patch
Reverted changes for initStyle = SCE_HPHP_DEFAULT;
.
Trying to search for "phpscript" or quickly read PHP 4.x release notes doesn't find much of interest so I suspect it disappeared or has another name.
I find one file that contains PHP code without tags: the interactive shell history file ~/.php_history
, see https://www.php.net/manual/en/features.commandline.interactive.php
PrintScriptingIndicatorOffset-0828.patch
First place is
PrintScriptingIndicatorOffset(styler, styler.GetStartSegment() + 2, i + 6);
, it can be fixed by changePrintScriptingIndicatorOffset()
to following:segIsScriptingIndicator-0828.patch
Second place is
scriptLanguage = segIsScriptingIndicator(styler, i + 2, i + 6, isXml ? eScriptXML : eScriptPHP);
, as it only make sense to handle<?php
and<?xml
at this position (rest ofsegIsScriptingIndicator()
is used to detect script language fromlanguage
ortype
attribute value), it seems better to moveContains(s, "php")
andContains(s, "xml")
cases to a new function, e.g.:I don't know the purpose of checking space after
xml
, or any XML processing instruction withxml
prefix, so removedIsASpace()
.<script language="php"></script>
was removed in PHP 7 (see https://wiki.php.net/rfc/remove_alternative_php_tags), soContains(s, "php")
can be removed from oldsegIsScriptingIndicator()
.