SAP / python-pyodata

Enterprise-ready Python OData client
Apache License 2.0
223 stars 93 forks source link

issue with parsing metadata (DeviceServices) tag with a differnent namespace #249

Open DrYSG opened 1 year ago

DrYSG commented 1 year ago

I am using the package python pyodata and getting parse issues when the meta-data is returned from the root ODATA location. I don't own the site, so the ODATA interface is controlled by riverbed software.

The error is (It look like it is looking for the DataService tag.

  File "C:\Users\1455765990E\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyodata\v2\model.py", line 2752, in build
    dataservices = next((child for child in edmx if etree.QName(child.tag).localname == 'DataServices'))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\1455765990E\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyodata\v2\model.py", line 2752, in <genexpr>
    dataservices = next((child for child in edmx if etree.QName(child.tag).localname == 'DataServices'))
                                                    ^^^^^^^^^^^^^^^^^^^^^^
  File "src\lxml\etree.pyx", line 1849, in lxml.etree.QName.__init__
  File "src\lxml\apihelpers.pxi", line 1754, in lxml.etree._tagValidOrRaise
ValueError: Invalid tag name '<cyfunction Comment at 0x000001B5E651C5F0>'

and my code is: (I tired to tell it that it is a non-standard name space with a custom name space tag.

import requests
import pyodata
from pyodata.v2.model import PolicyFatal, PolicyWarning, PolicyIgnore, ParserError, Config

serviceURL = 'https://lms-odata.aternity.com/aternity.odata/v2.0/'
session = requests.Session()
session.auth = ('myself@mycom.com, 'pass')
session.verify = False

namespaces = {
    'edmx': 'http://docs.oasis-open.org/odata/ns/edmx',
    'edm': 'http://docs.oasis-open.org/odata/ns/edm',
    'riverbed': 'www.riverbed.com/api/contract'
}

custom_config = Config(
    xml_namespaces=namespaces,
    default_error_policy=PolicyIgnore()
)

with open('metadata.xml', 'rb') as metaFile:
    metadata = metaFile.read()

# Create instance of OData client
aternity = pyodata.Client(serviceURL, session, metadata=metadata, config=custom_config)

the start of the metadata file looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<edmx:Edmx xmlns:edmx="http://docs.oasis-open.org/odata/ns/edmx" xmlns:riverbed="www.riverbed.com/api/contract" Version="4.0" riverbed:aternity_api_version="2.0">
   <!--\n\n \tAternity Api version : 2.0 \n\n-->
   <edmx:DataServices>
      <Schema xmlns="http://docs.oasis-open.org/odata/ns/edm" Namespace="aternity.odata">
         <EntityType Name="RAW_DATA_TOP_PROCESSES_VIRTUAL_APP_SERVER_SESSIONS_Type">
            <Property Name="ACCOUNT_ID" Type="Edm.Int64" />
            <Property Name="APPLICATION_NAME" Type="Edm.String" MaxLength="1024" />
            <Property Name="APPLICATION_VERSION" Type="Edm.String" MaxLength="1024" />
            <Property Name="MEASUREMENT_START_TIMESTAMP" Type="Edm.DateTimeOffset" />
            <Property Name="PRC_CPU_UTIL_AVG" Type="Edm.Decimal" Precision="37" Scale="10" />
            <Property Name="PRC_IO_READ_RATE" Type="Edm.Decimal" Precision="37" Scale="10" />

...
phanak-sap commented 1 year ago

Hi @DrYSG I just glanced oved the issue, did not yet tried to reproduce it locally. Anyway, for me the root cause seems to be the LXML parsing and presence of comments in the XML. This is what I didn't see so far in any $metadata file and it seems that we do not have it as a unit test case.

If this hypothesis is true, the fix should be pretty straightforward in the pyodata package - e.g. https://stackoverflow.com/questions/18313818/how-to-not-load-the-comments-while-parsing-xml-in-lxml/

Idea for a possible quick workaround - try to download the metadata file first using just the requests library - url = 'https://lms-odata.aternity.com/aternity.odata/v2.0/$metadata', then remove the HTML/XML comment from the file and then initialize the pyodata package from local file, according to documentation: https://github.com/SAP/python-pyodata/blob/master/docs/usage/initialization.rst#get-the-service-with-local-metadata

DrYSG commented 1 year ago

Well, that is a step forward. Now, how would I track this new error down:

(BTW, I searched for a Precision="0" in the xml file, I don't see that anywhere.

python : Traceback (most recent call last):
At line:1 char:1
+ python .\pub-local.py
+ ~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (Traceback (most recent call last)::String) [], RemoteException
    + FullyQualifiedErrorId : NativeCommandError

  File "C:\Users\1455765990E\OneDrive - United States Air Force\Documents\Software Projects\Aternity\pub-local.py", line 25, in <module>
    aternity = pyodata.Client(serviceURL, session, metadata=metadata, config=custom_config)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\1455765990E\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyodata\client.py", line 91, in __new__
    return Client._build_service(logger, url, connection, odata_version, namespaces, config, metadata)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\1455765990E\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyodata\client.py", line 110, in _build_service
    schema = pyodata.v2.model.MetadataBuilder(metadata, config=config).build()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\1455765990E\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyodata\v2\model.py", line 2782, in build
    schema = Schema.from_etree(edm_schemas, self._config)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\1455765990E\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyodata\v2\model.py", line 1276, in from_etree
    etype = EntityType.from_etree(entity_type, config)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\1455765990E\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyodata\v2\model.py", line 1687, in from_etree
    etype = super(EntityType, cls).from_etree(type_node, config)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\1455765990E\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyodata\v2\model.py", line 1513, in from_etree
    stp = StructTypeProperty.from_etree(proprty)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\1455765990E\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyodata\v2\model.py", line 1930, in from_etree
    return StructTypeProperty(
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\1455765990E\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyodata\v2\model.py", line 1814, in __init__
    super(StructTypeProperty, self).__init__(name, type_info, nullable, max_length, precision, scale, fixed_length)
  File "C:\Users\1455765990E\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyodata\v2\model.py", line 813, in __init__
    self._check_scale_value()
  File "C:\Users\1455765990E\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyodata\v2\model.py", line 892, in _check_scale_value
    raise PyODataModelError('Scale value ({}) must be less than or equal to precision value ({})'
pyodata.exceptions.PyODataModelError: Scale value (10) must be less than or equal to precision value (0)
DrYSG commented 1 year ago

metadata.xml.txt

phanak-sap commented 1 year ago

Hi @DrYSG.

First pointing out the obvious - this is a different, new problem.

You understood the error generally correctly, but ended searching for something that is simply not there at all - and that's the root cause of this new problem. We now moved from LXML parsing error (parsing of generic file as valid xml) to a pyodata parsing error (parsing generic xml as valid odata v2 metadata document).

The metadata file is invalid, in such way that even the loosest error handling policy PolicyIgnore cannot parse it without throwing out this exception.

This property is one example out of many with same problem: <Property Name="USERS_COUNT_LAST_7_DAYS" Type="Edm.Decimal" Scale="10" />

Other Edm.Decimals are defined correctly and they have no problem, e.g. <Property Name="PRC_CPU_UTIL_AVG" Type="Edm.Decimal" Precision="37" Scale="10" />

For more information, refer to chapter 6 - Primitive types here: https://www.odata.org/documentation/odata-version-2-0/overview/

Correct way is to fix the $metadata by the service API owner.

Workaround for you now is to manually search and replace all Type="Edm.Decimal" Scale="10" strings to just Type="Edm.Decimal" (Decimal type with uknown scale), since you are working with modified local file anyway :)

I tried that locally, using latest pyodata==1.10.1 on Python 3.10.2, and with that modification I can initialize pyodata client with

Metadata validation status: True

DrYSG commented 1 year ago

You get 5-stars for helpfulness, and even more for writing a complete answer. I completely agree with you about who needs to fix their code.

One small lesson you might want to take from this, that if the parser could throw the line that caused it to choke out in the stacktrace, Then one would not have to hunt to find the problem.

Your workaround is perfectly acceptable to us.

phanak-sap commented 1 year ago

OK, I will leave this issue open for a fix for parsing xml files with comments.

One small lesson you might want to take from this, that if the parser could throw the line that caused it to choke out in the stacktrace, Then one would not have to hunt to find the problem.

Yeah, we know that this would be nice basically for years. I would love it myself. But it is not that simple. It it basically a stream of chars in the end (or at least can be from networking libraries), so hard to backtrack to original line. But say even the character position would be enough... there seems to be nothing for such use case in the LXML parser itself, to go back from etree nodes back to original thing, if something during business logic of the etree parsing throws an error.

DrYSG commented 1 year ago

I was sorta expecting that it might be hard, especially if you are layering on 3rd party libs for the parse. I will close out the ticket as soon as I get the hotfix working.

It looks like there might be more edge cases, since I did (and double checked) all the cases you suggested, and it still gives the same error.

  File "C:\Users\1455765990E\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyodata\v2\model.py", line 892, in _check_scale_value
    raise PyODataModelError('Scale value ({}) must be less than or equal to precision value ({})'
pyodata.exceptions.PyODataModelError: Scale value (10) must be less than or equal to precision value (0)
phanak-sap commented 1 year ago

I have just did the search and replace the Scale="10" for Decimals that have just that and this is my metadata file - https://github.com/phanak-sap/pyodata-issue-files/tree/master/%23249

DrYSG commented 1 year ago

@phanak-sap Thank you Petr, all is well now. Now to get to the real stuff about using it. Thank you for sticking around.

DrYSG commented 1 year ago

@phanak-sap My apologies if I should be doing this different. Probably this is a different ticket, Maybe I should be using Stack Overflow. But since you are already up to speed on this, and have the metadata.xml file, I figured this is something you have a very quick answer for me.

I am trying to do the textbook example you have for "get all entities in a set": https://pyodata.readthedocs.io/en/latest/usage/querying.html

The code is:

custom_config = Config(
    xml_namespaces=namespaces,
    default_error_policy=PolicyIgnore()
)

with open('metadata.xml', 'rb') as metaFile:
    metadata = metaFile.read()

# Create instance of OData client
aClient = pyodata.Client(serviceURL, session, metadata=metadata, config=custom_config)

raw = aClient.entity_sets.BUSINESS_ACTIVITIES_RAW.get_entities().select('LOCATION_CITY,DEVICE_NAME').execute()
for row in raw:
    print(row.DEVICE_NAME, row.LOCATION_CITY)

But it giving this error which I cannot decipher:

C:\Users\1455765990E\AppData\Local\Programs\Python\Python311\Lib\site-packages\urllib3\connectionpool.py:1045: InsecureRequestWarning: Unverified HTTPS request is being made to host 'lms-odata.aternity.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
  warnings.warn(
Traceback (most recent call last):
  File "C:\Users\1455765990E\OneDrive - United States Air Force\Documents\Software Projects\Aternity\pub-local.py", line 27, in <module>
    raw = aClient.entity_sets.BUSINESS_ACTIVITIES_RAW.get_entities().select('LOCATION_CITY,DEVICE_NAME').execute()
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\1455765990E\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyodata\v2\service.py", line 349, in execute
    return self._call_handler(response)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\1455765990E\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyodata\v2\service.py", line 362, in _call_handler
    return self._handler(response)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\1455765990E\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyodata\v2\service.py", line 1494, in get_entities_handler
    entities = content['d']
               ~~~~~~~^^^^^
KeyError: 'd'

It would also be nice to have a example that just gets all properties for a row (or tells me how to query the metadata to get a list of all properties for an entity, so that I can iterate through the properties).

phanak-sap commented 1 year ago

Hi @DrYSG, keep it open, so I will have an issue for the XML comment fix.

Your current problems are one step further. Now we are not having problem with the metadata but with the actual HTTP traffic. And this is what I, without access to the service, can just guess. Doing pair programming blindly is hard. But I will try to point to to the right direction anyway.

1) " gets all properties for a row" is a strange question to me. I am guessing wildly this is your first encounter with the odata protocol and you are maybe applying direct SQL knowledge. I recommend you to go trough the really short overview on odata.org. Pyodata are expected to be used with some knowledge of odata protocol itself, so the documentation is "how to do this stuff" but not "what does it even means". Maybe play a bit first against the sample public Northwind V2 read only service first.

https://www.odata.org/documentation/odata-version-2-0/

2) the 'd' is the key for the expected default JSON format. See https://www.odata.org/documentation/odata-version-2-0/json-format/ for details. But I cannot possibly know what was the actual HTTP response that created this stacktrace. Why this key is not present in the response, therefore KeyError: 'd'. Is the response for example, instead of odata payload, actually something like HTML error page of "not authorized"?

3) since pyodata is by design networking library agnostic, it (similarly to lxml parsing package) depends on documentation of the package you are actually using for details of usage. In your case Requests library - https://requests.readthedocs.io/en/latest/

But I will save you some time and paste a snippet for your script, for enabling logging of HTTP traffic (request generated by Requests library and Responses, that are passed to return self._handler(response). From logging of actual HTTP requests and responses in console you should be able to investigate where the problem is.

import requests
from http.client import HTTPConnection

HTTPConnection.debuglevel = 1
logging.basicConfig(format='%(asctime)s %(message)s', datefmt='%m/%d/%Y %I:%M:%S %p')
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True
phanak-sap commented 1 year ago

Last but not least.. if you are not hard tied to python, there is also a sister library as a nodejs package - https://github.com/SAP/odata-library. They are ahead a bit, for example by having support of Odata protocol V4. Which is not something you need right now, just saying to promote colleagues :)

DrYSG commented 1 year ago

@phanak-sap

Let me do the footwork, By all rights I should be doing the work and lifting to figure this out. Thanks greatly for the tip on how to get the logging going. Yes, I am more comfortable in Node than Python right now, but this needs to be a python project.

Ahh, the old "d" key (I have not done any ODATA in a few years, so there are lots of things cluttering my brain not related to ODATA).

It is late here, so give me about 12 hours dive into the logs. I suspect that will get to the heart of the prolblem.

The ODATA service I am trying contact is done by a company called RIVERBED, and the Applicaiton is called Aternity (it aggregates network performance data from thousands of computers). The API documentation is at: https://help.aternity.com/bundle/console_user_guide_12_3_server_local/page/console/topics/console_api_odata_overview.html

There is definitely something fishy here. Since I did try using the EXCEL ODATA feed (PowerQuery) with the same password and username, and I am able to query (and filter) the BUSINESS_ACTIVITIES_RAW entities. No problems. But then, I was going direct to the URL: (not using metadata)

= OData.Feed("https://lms-odata.aternity.com/aternity.odata/BUSINESS_ACTIVITIES_RAW?$filter=relative_time(last_5_days)", null, [Implementation="2.0"])

If I am still stuck, let's consider taking this conversation off-line.

DrYSG commented 1 year ago

here is the results of the logging. As I said in the private message, it is clear they are not ODATA compliant. It is giving JSON, but no D tag.

log.txt response.json.txt

DrYSG commented 1 year ago

I broke out the JSON response. They are no following the "d" convention, but rather rooting the JSON tree as: "value"

phanak-sap commented 1 year ago

Hi @DrYSG This reminds me something - #131.

We both started with expectation that you work with odata v2, but even I ignored the edmx element properties. You are not working with Odata v2. You are working Aternity rest api v2, which is returning Odata protocol Version 4, which we do not yet support in pyodata (hopefully coming up, but slowly with no real release schedule).

It is even stated in the help you provieded https://help.aternity.com/bundle/console_user_guide_12_3_server_local/page/console/topics/console_api_odata_overview.html cite: "by entering the URL in the format of OData version 4"

See your metadata: <edmx:Edmx xmlns:edmx="http://docs.oasis-open.org/odata/ns/edmx" xmlns:riverbed="www.riverbed.com/api/contract" Version="4.0" riverbed:aternity_api_version="2.0">

options I see at the moment:

If I was in your position, based on what I know so far, I would consider also:

DrYSG commented 1 year ago

@phanak-sap

However, I do not see a version pyodata 1.11 (only 1.10.1). If you don't have a patched version that does a try/catch for "d" versus "value" then we are probably stuck here.

That is, I cannot see burdening our customer with a forked copy with a monkey patch. Especially, since you are headed (sometime) to V4 support.

NodeJS is not approved for our networks, so that is not going to help.

Obviously, the fall back is to use the Raw REQUESTS library directly to the OData V4 queires.

phanak-sap commented 1 year ago

However, I do not see a version pyodata 1.11 (only 1.10.1)

sorry, my typo. Anyway, in 1.11 do not expect big changes.

documentation at the odata site, which listed your library as V4

Which odata site please? we are listed as V2 only on https://www.odata.org/libraries/

V4 - you see from #39 it is long wanted feature. Sadly, expect no quick release of such feature, even if the V4 support would be first just "v2 equivalent feature set". Not that you could plan your project around such promise.

To soften the burn a bit - if I understand correctly, the server pagination you mentioned was quite isolated PR from the community, see https://github.com/SAP/python-pyodata/pull/188/files . It may be possible to reproduce it from just parsing the JSON grabbed by the Request library; if the _next is present in the JSON payload, then the url is also provided in the payload.

As a really ugly and last ditch solution - for basically a subset of V4 APIs, which are defined in a way that is 100 percent compatible with odata V2 - I guess we could have a new class under vendors, that would basically cover this problem (jsons are returned with different key than 'd' or without a nested json at all). No other V4 feature would be there, it would have pydoc explicitly describing the use case. No support for V4 APIs that does not work correctly with such vendor would be expected, it is up to user to check the metadata against v2 specification. Such ugly vendor would be easy to deprecate in the future with v4 support added. Some edge cases could be covered by this, until proper V4 support will be part of 1.x / 2.x pyodata version.

Changes to v2 service for "configurable" things that are out of odata V2 specification will definitely not be accepted.

But vendor class, however ugly, it will be part of official release, not a patched fork. If you want to work on pull request for such "hacky" vendor PR, I guess it could be a compromise on functionality and time; proper v4 support is the answer but I cannot provide at the moment a date when it will really be released.

DrYSG commented 1 year ago

@phanak-sap

I whipped this up and it is doing paged reads of all data at one entity. It also keeps a high water mark in a persistent store, so that next time it only fetches the new data. Data is saved in a JSON file, which is what I need to send to our Data Lake (which will be the next phase).

import requests
import shelve
import json
import datetime
import logging
from http.client import HTTPConnection

ServiceURL = 'https://lms-odata.aternity.com/aternity.odata/v2.0/'
BizEntity = "BUSINESS_ACTIVITIES_RAW"
HighWaterMark = '' # Latest Time Stamp for data
DEBUG = False
PageSize = 20
Session = requests.Session()
# Store - Shelve permanent storage of key/value pairs

def setup():
    global HighWaterMark
    global Store
    Store = shelve.open("settings")
    HighWaterMark = Store['highWaterMark']
    Session.auth = (Store['username'], Store['password'])
    Session.verify = False
    if (DEBUG):
        HTTPConnection.debuglevel = 1
        logging.basicConfig(format='%(asctime)s %(message)s', datefmt='%m/%d/%Y %I:%M:%S %p')
        logging.getLogger().setLevel(logging.DEBUG)
        requests_log = logging.getLogger("requests.packages.urllib3")
        requests_log.setLevel(logging.DEBUG)
        requests_log.propagate = True

def writer(data):
    jData = json.dumps({"data": data}, indent=2)
    with open('chunk.json', 'w') as chunkFile:
       chunkFile.write(jData)

def flatten(l):
    return [item for sublist in l for item in sublist]

def fetch():
    allChunks = []
    pages = 0
    params = f"?$page_size={PageSize}&$orderby=TIMEFRAME asc&$filter=TIMEFRAME gt {HighWaterMark}"
    url = ServiceURL + BizEntity + params
    while True:
        if not url:
            break
        resp = Session.get(url)
        if resp.status_code == 200:
            pages += 1
            chunk = json.loads(resp.text)
            allChunks.append(chunk["value"])
            url = chunk.get("@odata.nextLink", False) # Fetch next link
        else:
            print(f"url: {url}")
            exit("OData Call Failed....")
    writer(flatten(allChunks))
    print(f"PageCount: {pages}")
    flat = flatten(allChunks)
    if (len(flat) > 0 ):
        last = flat[-1]
        ts = last.get("TIMEFRAME")
        print (f"time: {ts}")
        Store['highWaterMark'] = ts

setup()
fetch()