Based on the findings in #733 it was concluded that the XML serialization code does not parse the XML type normalizedstring correctly. This does not only affect the license name field, but all occurrences of this string type.
There is a NormalizedString type in /external_models available that replaces a set of forbidden characters, e.g. line feed or tab, in NormalizedString::new with white space characters. All XML schemas in the specification repository use this standard XML type normalizedstring, while the JSON schemas do not use such a string type or put any associated restrictions on them. Therefore this type should only apply to the XML serialization code.
The cyclonedx-bom code handles the normalizedstring XML type as a String, only when converting the spec types to their model representations the NormalizedString type is used, but for both JSON and XML parsed objects. That is wrong, the normalizedstring XML type & therefore the NormalizedString model type are only relevant as part of the XML specification & should be used in this context only.
To fix the issue, the code should be adapted in the following way.
[ ] introduce a NewType struct that encapsulates a String, which is handled as a normal string with serde_json, most likely under folder /spec, but applies replacing chars in XML serialization code
[ ] check the XML specification to see which types in the /spec folder need to change their type from String to the NewType struct, therefore documenting types closer to the XML specification. This needs to be done for all available spec versions
[ ] remove the NormalizedString type from the models folder, the model types that currently declare it, e.g. Option<NormalizedString>, should declare them with String only
[ ] implement FromXml & ToXml traits for the NewType struct that replaces the forbidden white space characters
[ ] update all conversion functions between model & spec types accordingly
[ ] update serialization & deserialization code of the string fields in the XML logic, for example instead of using read_simple_tag to read a String the NewType::read_xml_element trait method is called
[ ] update & fix tests, there are lot of specific tests to validate NormalizedString fields, which should become unnecessary, likely to eliminate a lot of unnecessary validations?
[ ] remove the validate_normalized_string function, it's very likely not needed anymore, which should simplify validation logic for a bunch of fields
These are a lot of changes, but the outcome is
all string types that define normalizedstring in the XML schemas are correctly parsed afterwards, similar to the behavior of other XML libraries in other languages
JSON & XML string types are handled the same way afterwards, the forbidden white space characters are allowed in JSON, but replaced in XML correctly
validation logic becomes simpler, same for a lot of tests
733 will be fixed, so are all other fields with the normalizedstring type
finally the XML serialization code follows the specifcation more closely, & therefore cyclonedx-bom gets closer to other BOM libraries of the CycloneDX family
cc @Shnatsel, @lfrancke (thanks & also thanks to all others on Discord for their input)
Based on the findings in #733 it was concluded that the XML serialization code does not parse the XML type
normalizedstring
correctly. This does not only affect the license name field, but all occurrences of this string type.There is a
NormalizedString
type in/external_models
available that replaces a set of forbidden characters, e.g. line feed or tab, inNormalizedString::new
with white space characters. All XML schemas in the specification repository use this standard XML typenormalizedstring
, while the JSON schemas do not use such a string type or put any associated restrictions on them. Therefore this type should only apply to the XML serialization code.The
cyclonedx-bom
code handles thenormalizedstring
XML type as aString
, only when converting the spec types to their model representations theNormalizedString
type is used, but for both JSON and XML parsed objects. That is wrong, thenormalizedstring
XML type & therefore theNormalizedString
model type are only relevant as part of the XML specification & should be used in this context only.To fix the issue, the code should be adapted in the following way.
String
, which is handled as a normal string withserde_json
, most likely under folder/spec
, but applies replacing chars in XML serialization code/spec
folder need to change their type fromString
to the NewType struct, therefore documenting types closer to the XML specification. This needs to be done for all available spec versionsNormalizedString
type from themodels
folder, the model types that currently declare it, e.g.Option<NormalizedString>
, should declare them withString
onlyFromXml
&ToXml
traits for the NewType struct that replaces the forbidden white space charactersread_simple_tag
to read aString
theNewType::read_xml_element
trait method is calledNormalizedString
fields, which should become unnecessary, likely to eliminate a lot of unnecessary validations?validate_normalized_string
function, it's very likely not needed anymore, which should simplify validation logic for a bunch of fieldsThese are a lot of changes, but the outcome is
normalizedstring
in the XML schemas are correctly parsed afterwards, similar to the behavior of other XML libraries in other languages733 will be fixed, so are all other fields with the
normalizedstring
typecyclonedx-bom
gets closer to other BOM libraries of the CycloneDX familycc @Shnatsel, @lfrancke (thanks & also thanks to all others on Discord for their input)