SAP / python-pyodata

Enterprise-ready Python OData client
Apache License 2.0
223 stars 93 forks source link

Unable to parse metadata when '&' character is present in property attributes. #126

Closed phanak-sap closed 2 years ago

phanak-sap commented 4 years ago

Bug reproduced in version: 1.6.0

Environment info: Win 10 Python 3.7.1 pyodata-1.6.0 lxml-4.5.2

Steps to reproduce:

Simplest is to use the script to open local metadata file, e.g. https://github.com/SAP/python-pyodata/blob/master/docs/usage/initialization.rst#get-the-service-with-local-metadata

$metadata content to reproduce the bug

<edmx:Edmx xmlns:edmx="http://schemas.microsoft.com/ado/2007/06/edmx" xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata" xmlns:sap="http://www.sap.com/Protocols/SAPData" Version="1.0">
<edmx:DataServices m:DataServiceVersion="2.0">
<Schema xmlns="http://schemas.microsoft.com/ado/2008/09/edm" Namespace="FAA_MD_MANAGE_SRV" xml:lang="en" sap:schema-version="1">
<EntityType Name="C_AssetTPType" sap:label="Asset" sap:content-version="1">
<Property Name="IN_AssetIsResearchAndDev" Type="Edm.String" sap:label="R & D Asset" sap:quickinfo="India: R & D Asset"/>
</EntityType>
</Schema>
</edmx:DataServices>
</edmx:Edmx>

Stacktrace:

 File "C:\Python37\lib\site-packages\pyodata\v2\model.py", line 2428
  File "src\lxml\etree.pyx", line 3519, in lxml.etree.parse
  File "src\lxml\parser.pxi", line 1856, in lxml.etree._parseDocument
  File "src\lxml\parser.pxi", line 1876, in lxml.etree._parseMemoryDocument
  File "src\lxml\parser.pxi", line 1764, in lxml.etree._parseDoc
  File "src\lxml\parser.pxi", line 1127, in lxml.etree._BaseParser._parseDoc
  File "src\lxml\parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
  File "src\lxml\parser.pxi", line 711, in lxml.etree._handleParseResult
  File "src\lxml\parser.pxi", line 640, in lxml.etree._raiseParseError
File "<string>", line 5
lxml.etree.XMLSyntaxError: xmlParseEntityRef: no name, line 5, column 75

Additional Notes:

Metadata are parsed correctly when you delete the '&' characters

AlapanGhosh commented 3 years ago

The bug stems from LXML trying to parse an non-sanitized XML string. In this case, the '&' needs to be escaped to &amp; as it would otherwise be considered as the special character, and the parser would fail. https://www.novixys.com/blog/what-characters-need-to-be-escaped-in-xml-documents

This SO thread might be of help https://stackoverflow.com/questions/4972210/escape-unescaped-characters-in-xml-with-python

prashdsouza commented 3 years ago

Incident created on the framework

https://support.wdf.sap.corp/sap/support/message/2170026821

filak-sap commented 3 years ago

@prashdsouza Thank you! One of the goals of PyOData was to discover issues in my odata services.

phanak-sap commented 2 years ago

This bug is a false positive - the '&' was correctly escaped, but when bug was reported to me, I received non-escaped version of the XML file, which obviously reproduced the bug.

PR #181 just adds a better exception text for such edge case so the problem would be better understandable to someone who does not know LXML.