Open pbiron opened 2 years ago
For example, when trying to validate Security/SafeRedirectStandard.xml with Xerces2-j (in schema 1.0 mode), the following error message is output:
[Error] phpcsdocs.xsd:3:17: cos-nonambig: standard and WC[##any] (or elements from their substitution group) violate "Unique Particle Attribution". During validation against this schema, ambiguity would be created for those two particles.
(even when there is no element in the XML instance that could potentially match the wildcard)
Hi @pbiron!
Thanks for the comments, I wasn't aware we violated anything (I never got any notice about the schema version when working on it from my IDE). I guess I should have made better research, as I'm not really that knowledgeable about XML schemas 😬
The original discussion was started in this issue. The original PR was also made in the WPCS repo.
What I used as a basis of the XSD file is the default phpcs.xsd schema.
Is there a tool we can use to verify the validity of the schema against a certain XML version spec?
I think @jrfnl won't be against a PR that will make the schema even better 🙂
@pbiron Thanks for opening this issue to discuss your concerns.
The use of wildcards in select places is intentional as we didn't want to make the XSD more restrictive than is strictly necessary.
Since there are not that many v1.1 schema processors out there (especially in PHP, and most IDEs), it would probably be a good idea to modify the schema to conform to v1.0 of the schema spec.
This XSD is not intended for processing via PHP. It's intended use is with XMLlint. See the usage instructions in the README.
And we do actually validate the XSD against the W3C specs during CI and the XSD validates against those: https://github.com/PHPCSStandards/PHPCSDevTools/commit/5d491b62ed8c7bc4eb03a4e6622c1e032aa09f30
I have some other suggestions for improvements to the schema, but it would be a good idea to have some general discussions around the "goals" of the schema (and validation in general) before jumping into opening a PR.
The target public for this XSD file is very limited: it is a tool which is only interesting for maintainer of sniff libraries.
Suggestions to improve the XSD are definitely welcome, but please don't take offense if any are rejected as, as I state above, the goal is for it to be a helpful tool for sniff developers and not to restrict them more than necessary.
As a side-note: the 1.1 version of the specs, which we used during our research when creating the XSD, was published in 2012, so any tooling not compatible with version 1.1 has had 10 years to update already. Not sure we should accommodate tools which are not actively maintained.
Could:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
...be changed to:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema/v1.1" elementFormDefault="qualified">
...as per https://www.w3.org/TR/xmlschema11-1/#langids to explictely indicate that it's written with the XSD 1.1 spec?
@GaryJones Sounds like a good improvement to make.
Thanx for all the replies everyone.
My only goal in opening this issue is to make sure that the XML Schema is usable in the widest array of XML schema processors...just as WP tries to be available on the widest array of PHP/webserver environments. Since the XML instances are very simple, I believe there is no need to write a schema that doesn't conform to v1.0 of the schema spec.
As one of the editors of v1.0 of the XML Schema spec (https://www.w3.org/TR/xmlschema-2/) and a member of the XML Schema Working Group that produced v1.1 of the schema spec, XML Schema is something that I am well versed in and am just trying to contribute that knowledge to this project.
Julliette, I agree that using wildcards is a good thing...and I'm not suggesting removing the wildcard.
Generally, when allowing "extension" elements in XML instances governed by an XML schema, it's a good idea to require those extension elements to be in an XML namespace different from the targetNamespace
of the schema govering the instances. Since this XML schema does not have a targetNamespace (i.e., the elements are not in any namespace), this means that it would be a good practice to require extension elements to explictly be in a namespace (any namespace). This can easily be acheived by changing the element wildcard from:
<xs:any minOccurs='0'/>
to
<xs:any namespace='##other' minOccurs='0'/>
Not only would this change make it perfectly clear which elements in the XML instances are part of the "standard" and which are "extension" elements, it also means that the element wildcard no longer violates v1.0 UPA...and hence, an XML Schema v1.0 processor is able to validate the instances.
Similarly, I would suggest that the namespace='##other'
attribute be added to all the attribute wildcards (that is, requiring that all "extension" attributes also have to explictly be in an XML namespace).
The next change I would make to that element wildcard is to make be "lax" as the attribute wildcards already are, as in:
<xs:any namespace='##other' minOccurs='0' processContents='lax'/>
What does "lax" mean? It means that if the XML schema processor can find a schema for those extension elements, then it will try to validate them against the extension schema, if not then it will silently ignore them (essentially, treat them "as if" they were valid). As is, the element wildcard is strict
, meaning that the schema processor must be able to find a schema declaration for it and will try to validate it against that schema...and if no schema can be found or no element declaration can be found for the extension element, then the entire XML instance is marked as invalid
.
For example, if someone were to add a <this_is_an_extension_element/>
element after a <code_comparison>
element, as in:
<?xml version="1.0"?>
<documentation xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="https://phpcsstandards.github.io/PHPCSDevTools/phpcsdocs.xsd"
title="Safe Redirect"
>
<standard>
<![CDATA[
wp_safe_redirect() should be used whenever possible to prevent open redirect vulnerabilities. One of the main uses of an open redirect vulnerability is to make phishing attacks more credible. In this case the user sees your (trusted) domain and might get redirected to an attacker controlled website aimed at stealing private information.
]]>
</standard>
<code_comparison>
<code title="Valid: Redirect can only go to allowed domains.">
<![CDATA[
<em>wp_safe_redirect</em>( $location );
]]>
</code>
<code title="Invalid: Unsafe redirect, can be abused.">
<![CDATA[
<em>wp_redirect</em>( $location );
]]>
</code>
</code_comparison>
<this_is_an_extension_element/>
</documentation>
even a v1.1 schema processor (such as xmllint), would produce a validation error such as:
SafeRedirectStandard.xml:24: element this_is_an_extension_element: Schemas validity error : Element 'this_is_an_extension_element': No matching global element declaration available, but demanded by the strict wildcard. SafeRedirectStandard.xml fails to validate
With the processContents='lax'
attribute added to the element wildcard, then the instance would still validate.
I'll address several of the other comments in separate comments of my own.
Xerces2-j (in schema 1.0 mode)
I've never used Xerces2-j, but according to the docs, it claims to support XML 1.1 processing (other than normalization), so why would you run it in "schema 1.0 mode" for an XML 1.1-specced XSD?
Xerces2-j is the most widely used XML Schema processor out there. In fact, the vast majority of XML tools that are written in Java (which is most of them) use Xerces2-j "under the hood". For instance, PhpStorm does (see https://www.jetbrains.com/help/phpstorm/working-with-xml.html), altho I suspect that that page is out-of-date, since it mentions Xerces 2.11 which did not include support for XML Schema v1.1. Xerces didn't support schema 1.1 until version 2.12.0 (release in late 2018).
Many codebases that use Xerces "under the hood" still don't support schema v1.1 (such as Eclipse, the IDE I use...in some senses, PhpStorm is basically a fork of Eclipse). Why? Because the schema v1.1 support in Xerces 2.12 is only available through it's JAXP interface and it's not always easy for tools to switch from using Xerces' DOM or SAX interfaces.
Could:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
...be changed to:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema/v1.1" elementFormDefault="qualified">
...as per https://www.w3.org/TR/xmlschema11-1/#langids to explictely indicate that it's written with the XSD 1.1 spec?
Unfortunately, the Schema Language Identifiers
section of the schema v1.1 spec is unclear when it says:
Sometimes other specifications or Application Programming Interfaces (APIs) need to refer to the XML Schema Definition Language in general, sometimes they need to refer to a specific version of the language, possibly even to a version defined in a superseded draft. To make such references easy and enable consistent identifiers to be used, we provide the following URIs to identify these concepts.
Essentially, the use of the term refer
in that section means "when writing documentation you can use this URLs to refer to a specific version of the schema spec" (such as RDF, which uses URLs [actually, IRIs] to uniquely identify a "resource").
As the v1.1 schema spec says in The Schema Namespace
The XML representation of schema components uses a vocabulary identified by the namespace name http://www.w3.org/2001/XMLSchema. For brevity, the text and examples in this specification use the prefix xs: to stand for this namespace; in practice, any prefix can be used.
Note: The namespace for schema documents is unchanged from version 1.0 of this specification [XSD 1.0 2E], because any schema document valid under the rules of version 1.0 has essentially the same ·assessment· semantics under this specification as it did under version 1.0 (Second Edition). There are a few exceptions to this rule, involving errors in version 1.0 of this specification which were not reparable by errata and which have therefore been fixed only in this version of this specification, not in version 1.0.
If you were to change the namespace URI in the schema to http://www.w3.org/XML/XMLSchema/v1.1
and try to validate an instance (e.g., using xmllint) you'd get an error such as:
Schemas parser error : The XML document 'phpcsdocs.xsd' is not a schema document.
WXS schema phpcsdocs.xsd failed to compile
@pbiron Sounds like you have some useful improvements in mind. I look forward to a PR.
When working on this, please be aware that the most important aspect of this XSD is that PHP_CodeSniffer should be able to interpret and display the information correctly - this XSD is specifically for the XML files which PHPCS displays when using the --generator=[Text|HTML|Markdown]
option.
The current XSD is safeguarded by tests to ensure this and those tests would still need to pass for the PR (and if any additions are made, new tests may need to be added).
My only goal in opening this issue is to make sure that the XML Schema is usable in the widest array of XML schema processors...just as WP tries to be available on the widest array of PHP/webserver environments.
Considering that this toolset is only interesting for a very small group of people, that is not really a big concern for this project. Having said that, I'm perfectly happy to accept a PR which will make it more widely usable.
I'll have to go through the existing tests to see if any of them will need to be changed. I suspect that at least 2 new tests will need to add to check that any elements/attributes that match the wildcards are explicitly in an XML namespace (one for extension elements, another for extension attributes).
Since I've never used phpcs --generator=[Text|HTML|Markdown]
, how does that use the these XML instances in producing it's output (so that I can verify that requiring extension elements/attributes be explicitly namespace qualified doesn't cause any problems)?
I just did an informal test of adding some extension elements/attributes to a few of the documentation
files included with the installation of WPCS 2.3.0 for one of my plugins and ran phpcs --generator=HTML
and as far as I can tell, the HTML output produced is the same as without the extension elements/attributes. Is there a "more formal" way I should verify there will be no problems?
I'll have to go through the existing tests to see if any of them will need to be changed. I suspect that at least 2 new tests will need to add to check that any elements/attributes that match the wildcards are explicitly in an XML namespace (one for extension elements, another for extension attributes).
👍🏻
Since I've never used
phpcs --generator=[Text|HTML|Markdown]
, how does that use the these XML instances in producing it's output (so that I can verify that requiring extension elements/attributes be explicitly namespace qualified doesn't cause any problems)?
The code for the generators can be found here: https://github.com/squizlabs/PHP_CodeSniffer/tree/master/src/Generators
To see it in action, try the below commands on the command line:
# For sniff documentation on the command line:
> phpcs --standard=PSR12 --generator=Text
# To generate the same documentation as an HTML page to add to a website:
> phpcs --standard=PSR12 --generator=HTML > psr12.html
# Similar, to generate the same documentation to be published as markdown (like in a GH wiki or a GHPages website):
> phpcs --standard=PSR12 --generator=Markdown > psr12.md
I just did an informal test of adding some extension elements/attributes to a few of the
documentation
files included with the installation of WPCS 2.3.0 for one of my plugins and ranphpcs --generator=HTML
and as far as I can tell, the HTML output produced is the same as without the extension elements/attributes.
For WPCS, there is currently no intention to add extra elements/attributes.
For the PHPCompatibility standard, there is, though IIRC, the limited set of currently available documentation doesn't do so yet.
A typical attribute PHPCompatibility may start using would be minPHP="7.1"
(for instance), probably added to the standard
element.
A typical extra element PHPCompatibility may start using would be something along the lines of <references>
with within that a set of <url>...link here...</url>
elements which would allow for linking back to the PHP RFC and PHP manual documentation related to the change the sniff detects.
Does that help ?
Is there a "more formal" way I should verify there will be no problems?
I suppose we could set something up in the test suite to verify that the docs are generated by PHPCS as expected.
This package contains one sniff with (very minimal) documentation which could possibly be used for that. Then again, it would probably be better if PHPCS itself would test the generators they provide (currently not the case, well, only done manually).
The code for the generators can be found here: https://github.com/squizlabs/PHP_CodeSniffer/tree/master/src/Generators
Thanx. Looking at that code, I don't foresee any problems...since it is looking for specific named elements/attributes via DOMDocument::getElementsByTagName()
and DOMDocument::getAttribute()
, so any additional elements/attributes won't interfere.
And I know you previously said that doing schema validation in PHP was a non-goal, but it might be helpful for PHP_CodeSniffer\Generators\Generator::generate()
(and that method in the sub-classes, e.g., HTML::generate()
, etc) to do schema validation, since none of that code is written defensively, e.g.,
public function generate()
{
foreach ($this->docFiles as $file) {
$doc = new \DOMDocument();
$doc->load($file);
$documentation = $doc->getElementsByTagName('documentation')->item(0);
$this->processSniff($documentation);
}
}//end generate()
will result in a PHP fatal error in any of $this->docFiles
do not contain a <documentation>
element, e.g.
PHP Fatal error: Uncaught TypeError: Argument 1 passed to PHP_CodeSniffer\Generators\HTML::processSniff() must be an instance of DOMNode, null given, called in vendor\squizlabs\php_codesniffer\src\Generators\HTML.php on line 38 and defined in vendor\squizlabs\php_codesniffer\src\Generators\HTML.php:191
And DOMDocument::schemaValidate() only supports XML Schema 1.0, so one more reason it will be good if the XML Schema conforms to v1.0 :-) Of course, if the CI is done correctly, then there should be no invalid XML instances...but still, always a good idea to either code defensively OR validate before using the XML.
will result in a PHP fatal error in any of
$this->docFiles
do not contain a<documentation>
element
And that is as it should be.
In contrast to WP, in most PHP projects, it is not customary to hide PHP errors.
I'm a huge fan of defensive coding, but a fatal error when the doc file is invalid is the correct result and the doc file should be fixed in that case. That can either be the PHP native fatal or a custom RuntimeException
, but it should still throw an error.
@pbiron Sorry, I realize you are using WP as a frame of reference, but WP is, in all honesty, one of the worst pieces of code out there and really not a good example of good coding practices, nor of a good coding culture.
A typical attribute PHPCompatibility may start using would be minPHP="7.1" (for instance), probably added to the standard element. A typical extra element PHPCompatibility may start using would be something along the lines of
with within that a set of ...link here... elements which would allow for linking back to the PHP RFC and PHP manual documentation related to the change the sniff detects.Does that help ?
We'll have to discuss this a little more. Because if I understand what you're saying, I would NOT consider the minPHP
attribute nor the reference
and url
elements to be extension elements. They seem more like optional elements/attributes that will (might be?) added to standard XML schema at some point in the future. (and the generators modified to process them if they exist?).
By extension elements/attributes, I think of elements/attributes added to the XML instances that are intended to be processed by a tool other than phpcs
, itself. For example, suppose a company writes a Standard
/ruleset
that is specific to their company (i.e., not intended for "public" consumption) and they write XML Docs
for their sniffs. I can imagine them adding extension elements/attributes that are used by other tools in their build stack.
And it is those "company-specific" elements/attributes that I'm suggesting should be in a company-specific namespace.
will result in a PHP fatal error in any of
$this->docFiles
do not contain a<documentation>
elementAnd that is as it should be.
In contrast to WP, in most PHP projects, it is not customary to hide PHP errors. I'm a huge fan of defensive coding, but a fatal error when the doc file is invalid is the correct result and the doc file should be fixed in that case. That can either be the PHP native fatal or a custom
RuntimeException
, but it should still throw an error.
I'm not suggesting hiding PHP errors!! Trust me, I won't be offended if you don't take any of my advice, but I would much rather the generate()
method do either:
foreach ($this->docFiles as $file) {
$doc = new \DOMDocument();
$doc->load($file);
if ( ! $doc->schemaValidate( 'phpcsdocs.xsd' ) {
// output error message that `$file` is not valid and is being skipped.
continue;
}
$documentation = $doc->getElementsByTagName('documentation')->item(0);
$this->processSniff($documentation);
}
or, at least:
foreach ($this->docFiles as $file) {
$doc = new \DOMDocument();
$doc->load($file);
$documentation = $doc->getElementsByTagName('documentation');
if ( $documentation->count() < 1 ) {
// output error message that `$file` does not contain a `documentation` element and is being skipped.
continue;
}
$this->processSniff($documentation->item(0);
}
That is, process what you can and tell the user what you can't process...graceful degradation.
We'll have to discuss this a little more.
PHPCS will not add support for additional attributes/elements to the doc generators. This has been discussed in that repo before and Greg is leaning more towards changing the whole docs setup completely, but it is completely unknown to "what" it will be changed.
If and when that happens, I intend to add a tool to convert existing XML based PHPCS docs to whatever the new format is to this tooling repo.
As for external tooling using the docs files... there is a WIP/Proof of concept repo here: https://github.com/PHPCSStandards/phpcs-docs
On the one hand, this PoC may be used to start auto-generating a docs website for PHPCompatibility, combining information in the sniff docblocks with the information/code samples in the XML docs. There is also an idea to have a generic PHPCS info website, including a searchable index of available sniffs from common and well-maintained external standards and it may be used as a basis for that.
For now, all of that is still up in the air as there's only so many hours in a day,
For reference, here are some links to previous discussions about the docs in the PHPCS repo:
@pbiron Sorry, I realize you are using WP as a frame of reference, but WP is, in all honesty, one of the worst pieces of code out there and really not a good example of good coding practices, nor of a good coding culture.
And, no, I'm not necessarily using WP as a frame of reference. While it's true that for the last 12+ years, 90% of what is do is WP-related, I've been a software engineer for ~30 years. And in fact, long before I'd ever heard of WP, I spent 10+ years writing XML related standards (not only through the W3C, but also for Health Level Seven, an ANSI-accredited SDO that defines clinical messaging standards used throughout the world) and code to process XML for the healthcare industry (including teaching other developers how to write XML processing code).
That is, process what you can and tell the user what you can't process...graceful degradation.
Ah, I think we were talking sideways. Yes, I agree your suggestion would be a nice improvement for PHPCS. however there are two caveats:
In case you are not aware: the proposal to add an XSD was first proposed to be added to PHPCS itself, but that was rejected (twice), which is why we decided to add it here as we still felt it would be useful to have the XSD available.
And, no, I'm not necessarily using WP as a frame of reference.
Apologies, I just got that impression based on some of the references and suggestions you made. No offense intended Paul!
:wave: Hiya, just checking in whether this is still being worked on ?
Some useful improvement suggestions were made in this ticket, but I've yet to see a PR. As it's been over a year, I'm tempted to close this ticket.
thanx for the reminder Juliette...I'd forgotten that I opened this.
I think I might have some time in the next week to put together a preliminary PR.
Tomorrow I'll refresh my memory about everything that's been said on this issue and that'll give me a better idea of when I'm likely to have a PR.
The phpcsdoc.xsd schema does not conform to v1.0 of the XML schema spec because of the wildcard in the
rulegroup
Model Group.Since the schema has no
targetNamespace
, the element wildcard in that model group (<xs:any minOccurs="0"/>
), as written, violates the Unique Particle Attribution constraint (since the wildcard could potentially match any unqualified element, including those specifically named in the model group).XML Schema 1.1 loosened UPA so that wildcard is OK as specified provided that a v1.1 schema processor is used.
Since there are not that many v1.1 schema processors out there (especially in PHP, and most IDEs), it would probably be a good idea to modify the schema to conform to v1.0 of the schema spec.
I have some other suggestions for improvements to the schema, but it would be a good idea to have some general discussions around the "goals" of the schema (and validation in general) before jumping into opening a PR.
Note: I came across the schema when I saw https://github.com/WordPress/WordPress-Coding-Standards/pull/2084.