CycloneDX / cyclonedx-php-library

PHP Implementation of OWASP CycloneDX Bill of Materials (BOM)
https://cyclonedx.org/
Apache License 2.0
7 stars 0 forks source link

[XML] properly handle `normalizedString` & `token` #451

Open jkowalleck opened 4 months ago

jkowalleck commented 4 months ago

CycloneDX uses http://www.w3.org/2001/XMLSchema - which defines normalizedString as follows:

<xs:simpleType name="normalizedString" id="normalizedString">
  <xs:annotation>
    <xs:documentation source="http://www.w3.org/TR/xmlschema-2/#normalizedString"/>
  </xs:annotation>
  <xs:restriction base="xs:string">
    <xs:whiteSpace value="replace" id="normalizedString.whiteSpace"/>
  </xs:restriction>
</xs:simpleType>

normalizedString represents white space normalized strings. The ·value space· of normalizedString is the set of strings that do not contain the carriage return (#xD), line feed (#xA) nor tab (#x9) characters. The ·lexical space· of normalizedString is the set of strings that do not contain the carriage return (#xD), line feed (#xA) nor tab (#x9) characters. The ·base type· of normalizedString is string.


CycloneDX uses http://www.w3.org/2001/XMLSchema - which defines token as follows:

<xs:simpleType name="token" id="token">
  <xs:annotation>
    <xs:documentation source="http://www.w3.org/TR/xmlschema-2/#token"/>
  </xs:annotation>
  <xs:restriction base="xs:normalizedString">
    <xs:whiteSpace value="collapse" id="token.whiteSpace"/>
  </xs:restriction>
</xs:simpleType>

token represents tokenized strings. The ·value space· of token is the set of strings that do not contain the carriage return (#xD), line feed (#xA) nor tab (#x9) characters, that have no leading or trailing spaces (#x20) and that have no internal sequences of two or more spaces. The ·lexical space· of token is the set of strings that do not contain the carriage return (#xD), line feed (#xA) nor tab (#x9) characters, that have no leading or trailing spaces (#x20) and that have no internal sequences of two or more spaces. The ·base type· of token is normalizedString.


therefore, on XML-normalization for normalizedString, the following chars must be replaced by space( ):

Therefore, on XML-normalization for token, the following must aplpy:

Affected are only fields that are defined as normalizedString respective token in XML spec! Other field MUST NOT be affected!

jkowalleck commented 4 months ago

possible solution: modify the XML normalizers where needed, and call a to-be-written helper function that does the normalization.

jkowalleck commented 4 months ago

solution as done in TS?JS https://github.com/CycloneDX/cyclonedx-javascript-library/pull/1116