Azure-Samples / active-directory-b2c-custom-policy-starterpack

Azure AD B2C now allows uploading of a Custom Policy which allows full control and customization of the Identity Experience Framework
http://aka.ms/aadb2ccustom
MIT License
334 stars 394 forks source link

Fixed "regexp error" when using libxml2 to load the xsd file #118

Open pushrbx opened 2 years ago

pushrbx commented 2 years ago

I'm currently working on a template system, where I generate the XML files, and I wanted to validate them. I inderictly use libxml2 from python via lxml to validate the generated XML files with the TrustFrameowrkPolicy_0.3.0.0.xsd schema file, but I get errors saying that line 3689 of the xsd file contains an invalid regular expression pattern.

From xmllint:

regexp error : failed to compile: Wrong escape sequence, misuse of character '\'
regexp error : failed to compile: xmlFAParseCharClass: ']' expected
regexp error : failed to compile: xmlFAParseRegExp: extra characters
../policies/TrustFrameworkPolicy_0.3.0.0.xsd:3689: element pattern: Schemas parser error : Element '{http://www.w3.org/2001/XMLSchema}pattern': The value '^urn:[a-z0-9][a-z0-9-]{0,31}:[a-z0-9()+,\/\-.:=@;$_!*'%\/?#]+$' of the facet 'pattern' is not a valid regular expression.
WXS schema ../policies/TrustFrameworkPolicy_0.3.0.0.xsd failed to compile

From python (in WSL/Ubuntu):

Validating files...
Traceback (most recent call last):
  File "/mnt/c/Users/pushrbx/PycharmProjects/aad-b2c-extensions/pman.py", line 169, in <module>
    main()
  File "/mnt/c/Users/pushrbx/PycharmProjects/aad-b2c-extensions/pman.py", line 161, in main
    build(config)
  File "/mnt/c/Users/pushrbx/PycharmProjects/aad-b2c-extensions/pman.py", line 99, in build
    validate_built_xml_files()
  File "/mnt/c/Users/pushrbx/PycharmProjects/aad-b2c-extensions/pman.py", line 45, in validate_built_xml_files
    xmlschema = etree.XMLSchema(xmlschema_doc)
  File "src/lxml/xmlschema.pxi", line 89, in lxml.etree.XMLSchema.__init__
lxml.etree.XMLSchemaParseError: Element '{http://www.w3.org/2001/XMLSchema}pattern': The value '^urn:[a-z0-9][a-z0-9-]{0,31}:[a-z0-9()+,\/\-.:=@;$_!*'%\/?#]+$' of the facet 'pattern' is not a valid regular expression., line 3689

You can also reproduce the issue with the command line tools of libxml2:

  1. On ubuntu: sudo apt install libxml2-utils
  2. xmllint --schema TrustFrameworkPolicy_0.3.0.0.xsd TrustFrameworkBase.xml --noout

With python you can reproduce it the following way:

  1. Python 3.8+ is required.
  2. pip install lxml==4.8.0 cython==0.29.28
  3. Create a python file repro.py
  4. Write the following in the repro.py file:
    
    from lxml import etree

with open("TrustFrameworkPolicy_0.3.0.0.xsd") as f: xmlschema_doc = etree.parse(f) xmlschema = etree.XMLSchema(xmlschema_doc)

with open("TrustFrameworkBase.xml"): doc = etree.parse(xml_file) xmlschema.assertValid(doc)



This PR addresses the issue. I need to test this with VSCode too, but I'm not using it on day to day basis, so it would be great if somebody could test this or point me to the right direction so I can set it up myself.

P.S.: Sorry about the whitespace changes.
ghost commented 2 years ago

CLA assistant check
All CLA requirements met.