Open DrYSG opened 1 year ago
Hi @DrYSG I just glanced oved the issue, did not yet tried to reproduce it locally. Anyway, for me the root cause seems to be the LXML parsing and presence of comments in the XML. This is what I didn't see so far in any $metadata file and it seems that we do not have it as a unit test case.
If this hypothesis is true, the fix should be pretty straightforward in the pyodata package - e.g. https://stackoverflow.com/questions/18313818/how-to-not-load-the-comments-while-parsing-xml-in-lxml/
Idea for a possible quick workaround - try to download the metadata file first using just the requests library - url = 'https://lms-odata.aternity.com/aternity.odata/v2.0/$metadata'
, then remove the HTML/XML comment from the file and then initialize the pyodata package from local file, according to documentation:
https://github.com/SAP/python-pyodata/blob/master/docs/usage/initialization.rst#get-the-service-with-local-metadata
Well, that is a step forward. Now, how would I track this new error down:
(BTW, I searched for a Precision="0" in the xml file, I don't see that anywhere.
python : Traceback (most recent call last):
At line:1 char:1
+ python .\pub-local.py
+ ~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (Traceback (most recent call last)::String) [], RemoteException
+ FullyQualifiedErrorId : NativeCommandError
File "C:\Users\1455765990E\OneDrive - United States Air Force\Documents\Software Projects\Aternity\pub-local.py", line 25, in <module>
aternity = pyodata.Client(serviceURL, session, metadata=metadata, config=custom_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\1455765990E\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyodata\client.py", line 91, in __new__
return Client._build_service(logger, url, connection, odata_version, namespaces, config, metadata)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\1455765990E\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyodata\client.py", line 110, in _build_service
schema = pyodata.v2.model.MetadataBuilder(metadata, config=config).build()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\1455765990E\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyodata\v2\model.py", line 2782, in build
schema = Schema.from_etree(edm_schemas, self._config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\1455765990E\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyodata\v2\model.py", line 1276, in from_etree
etype = EntityType.from_etree(entity_type, config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\1455765990E\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyodata\v2\model.py", line 1687, in from_etree
etype = super(EntityType, cls).from_etree(type_node, config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\1455765990E\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyodata\v2\model.py", line 1513, in from_etree
stp = StructTypeProperty.from_etree(proprty)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\1455765990E\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyodata\v2\model.py", line 1930, in from_etree
return StructTypeProperty(
^^^^^^^^^^^^^^^^^^^
File "C:\Users\1455765990E\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyodata\v2\model.py", line 1814, in __init__
super(StructTypeProperty, self).__init__(name, type_info, nullable, max_length, precision, scale, fixed_length)
File "C:\Users\1455765990E\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyodata\v2\model.py", line 813, in __init__
self._check_scale_value()
File "C:\Users\1455765990E\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyodata\v2\model.py", line 892, in _check_scale_value
raise PyODataModelError('Scale value ({}) must be less than or equal to precision value ({})'
pyodata.exceptions.PyODataModelError: Scale value (10) must be less than or equal to precision value (0)
Hi @DrYSG.
First pointing out the obvious - this is a different, new problem.
You understood the error generally correctly, but ended searching for something that is simply not there at all - and that's the root cause of this new problem. We now moved from LXML parsing error (parsing of generic file as valid xml) to a pyodata parsing error (parsing generic xml as valid odata v2 metadata document).
The metadata file is invalid, in such way that even the loosest error handling policy PolicyIgnore cannot parse it without throwing out this exception.
This property is one example out of many with same problem:
<Property Name="USERS_COUNT_LAST_7_DAYS" Type="Edm.Decimal" Scale="10" />
Other Edm.Decimals are defined correctly and they have no problem, e.g.
<Property Name="PRC_CPU_UTIL_AVG" Type="Edm.Decimal" Precision="37" Scale="10" />
For more information, refer to chapter 6 - Primitive types here: https://www.odata.org/documentation/odata-version-2-0/overview/
Correct way is to fix the $metadata by the service API owner.
Workaround for you now is to manually search and replace all Type="Edm.Decimal" Scale="10"
strings to just Type="Edm.Decimal"
(Decimal type with uknown scale), since you are working with modified local file anyway :)
I tried that locally, using latest pyodata==1.10.1
on Python 3.10.2, and with that modification I can initialize pyodata client with
Metadata validation status: True
You get 5-stars for helpfulness, and even more for writing a complete answer. I completely agree with you about who needs to fix their code.
One small lesson you might want to take from this, that if the parser could throw the line that caused it to choke out in the stacktrace, Then one would not have to hunt to find the problem.
Your workaround is perfectly acceptable to us.
OK, I will leave this issue open for a fix for parsing xml files with comments.
One small lesson you might want to take from this, that if the parser could throw the line that caused it to choke out in the stacktrace, Then one would not have to hunt to find the problem.
Yeah, we know that this would be nice basically for years. I would love it myself. But it is not that simple. It it basically a stream of chars in the end (or at least can be from networking libraries), so hard to backtrack to original line. But say even the character position would be enough... there seems to be nothing for such use case in the LXML parser itself, to go back from etree nodes back to original thing, if something during business logic of the etree parsing throws an error.
I was sorta expecting that it might be hard, especially if you are layering on 3rd party libs for the parse. I will close out the ticket as soon as I get the hotfix working.
It looks like there might be more edge cases, since I did (and double checked) all the cases you suggested, and it still gives the same error.
File "C:\Users\1455765990E\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyodata\v2\model.py", line 892, in _check_scale_value
raise PyODataModelError('Scale value ({}) must be less than or equal to precision value ({})'
pyodata.exceptions.PyODataModelError: Scale value (10) must be less than or equal to precision value (0)
I have just did the search and replace the Scale="10"
for Decimals that have just that and this is my metadata file - https://github.com/phanak-sap/pyodata-issue-files/tree/master/%23249
@phanak-sap Thank you Petr, all is well now. Now to get to the real stuff about using it. Thank you for sticking around.
@phanak-sap My apologies if I should be doing this different. Probably this is a different ticket, Maybe I should be using Stack Overflow. But since you are already up to speed on this, and have the metadata.xml file, I figured this is something you have a very quick answer for me.
I am trying to do the textbook example you have for "get all entities in a set": https://pyodata.readthedocs.io/en/latest/usage/querying.html
The code is:
custom_config = Config(
xml_namespaces=namespaces,
default_error_policy=PolicyIgnore()
)
with open('metadata.xml', 'rb') as metaFile:
metadata = metaFile.read()
# Create instance of OData client
aClient = pyodata.Client(serviceURL, session, metadata=metadata, config=custom_config)
raw = aClient.entity_sets.BUSINESS_ACTIVITIES_RAW.get_entities().select('LOCATION_CITY,DEVICE_NAME').execute()
for row in raw:
print(row.DEVICE_NAME, row.LOCATION_CITY)
But it giving this error which I cannot decipher:
C:\Users\1455765990E\AppData\Local\Programs\Python\Python311\Lib\site-packages\urllib3\connectionpool.py:1045: InsecureRequestWarning: Unverified HTTPS request is being made to host 'lms-odata.aternity.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
warnings.warn(
Traceback (most recent call last):
File "C:\Users\1455765990E\OneDrive - United States Air Force\Documents\Software Projects\Aternity\pub-local.py", line 27, in <module>
raw = aClient.entity_sets.BUSINESS_ACTIVITIES_RAW.get_entities().select('LOCATION_CITY,DEVICE_NAME').execute()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\1455765990E\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyodata\v2\service.py", line 349, in execute
return self._call_handler(response)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\1455765990E\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyodata\v2\service.py", line 362, in _call_handler
return self._handler(response)
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\1455765990E\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyodata\v2\service.py", line 1494, in get_entities_handler
entities = content['d']
~~~~~~~^^^^^
KeyError: 'd'
It would also be nice to have a example that just gets all properties for a row (or tells me how to query the metadata to get a list of all properties for an entity, so that I can iterate through the properties).
Hi @DrYSG, keep it open, so I will have an issue for the XML comment fix.
Your current problems are one step further. Now we are not having problem with the metadata but with the actual HTTP traffic. And this is what I, without access to the service, can just guess. Doing pair programming blindly is hard. But I will try to point to to the right direction anyway.
1) " gets all properties for a row" is a strange question to me. I am guessing wildly this is your first encounter with the odata protocol and you are maybe applying direct SQL knowledge. I recommend you to go trough the really short overview on odata.org. Pyodata are expected to be used with some knowledge of odata protocol itself, so the documentation is "how to do this stuff" but not "what does it even means". Maybe play a bit first against the sample public Northwind V2 read only service first.
https://www.odata.org/documentation/odata-version-2-0/
2) the 'd' is the key for the expected default JSON format. See https://www.odata.org/documentation/odata-version-2-0/json-format/ for details. But I cannot possibly know what was the actual HTTP response that created this stacktrace. Why this key is not present in the response, therefore KeyError: 'd'
. Is the response for example, instead of odata payload, actually something like HTML error page of "not authorized"?
3) since pyodata is by design networking library agnostic, it (similarly to lxml parsing package) depends on documentation of the package you are actually using for details of usage. In your case Requests library - https://requests.readthedocs.io/en/latest/
But I will save you some time and paste a snippet for your script, for enabling logging of HTTP traffic (request generated by Requests library and Responses, that are passed to return self._handler(response)
. From logging of actual HTTP requests and responses in console you should be able to investigate where the problem is.
import requests
from http.client import HTTPConnection
HTTPConnection.debuglevel = 1
logging.basicConfig(format='%(asctime)s %(message)s', datefmt='%m/%d/%Y %I:%M:%S %p')
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True
Last but not least.. if you are not hard tied to python, there is also a sister library as a nodejs package - https://github.com/SAP/odata-library. They are ahead a bit, for example by having support of Odata protocol V4. Which is not something you need right now, just saying to promote colleagues :)
@phanak-sap
Let me do the footwork, By all rights I should be doing the work and lifting to figure this out. Thanks greatly for the tip on how to get the logging going. Yes, I am more comfortable in Node than Python right now, but this needs to be a python project.
Ahh, the old "d" key (I have not done any ODATA in a few years, so there are lots of things cluttering my brain not related to ODATA).
It is late here, so give me about 12 hours dive into the logs. I suspect that will get to the heart of the prolblem.
The ODATA service I am trying contact is done by a company called RIVERBED, and the Applicaiton is called Aternity (it aggregates network performance data from thousands of computers). The API documentation is at: https://help.aternity.com/bundle/console_user_guide_12_3_server_local/page/console/topics/console_api_odata_overview.html
There is definitely something fishy here. Since I did try using the EXCEL ODATA feed (PowerQuery) with the same password and username, and I am able to query (and filter) the BUSINESS_ACTIVITIES_RAW entities. No problems. But then, I was going direct to the URL: (not using metadata)
= OData.Feed("https://lms-odata.aternity.com/aternity.odata/BUSINESS_ACTIVITIES_RAW?$filter=relative_time(last_5_days)", null, [Implementation="2.0"])
If I am still stuck, let's consider taking this conversation off-line.
here is the results of the logging. As I said in the private message, it is clear they are not ODATA compliant. It is giving JSON, but no D tag.
I broke out the JSON response. They are no following the "d" convention, but rather rooting the JSON tree as: "value"
Hi @DrYSG This reminds me something - #131.
We both started with expectation that you work with odata v2, but even I ignored the edmx element properties. You are not working with Odata v2. You are working Aternity rest api v2, which is returning Odata protocol Version 4, which we do not yet support in pyodata (hopefully coming up, but slowly with no real release schedule).
It is even stated in the help you provieded https://help.aternity.com/bundle/console_user_guide_12_3_server_local/page/console/topics/console_api_odata_overview.html cite: "by entering the URL in the format of OData version 4"
See your metadata: <edmx:Edmx xmlns:edmx="http://docs.oasis-open.org/odata/ns/edmx" xmlns:riverbed="www.riverbed.com/api/contract" Version="4.0" riverbed:aternity_api_version="2.0">
options I see at the moment:
'd'
key but your 'value'
. If you will only read data from the odata v4 API, you may perhaps get this patched version working, at first glance it seems that the specific Aternity api should be quite compatible with odata v2; no backward incompatible things from odata v4 seems to be used, for example Edm.Date. The API also does not work with Function Imports (v2), or later Actions (v4). If you will just read the data and process further, it may be work with such patched current pyodata 1.11 version.If I was in your position, based on what I know so far, I would consider also:
@phanak-sap
However, I do not see a version pyodata 1.11 (only 1.10.1). If you don't have a patched version that does a try/catch for "d" versus "value" then we are probably stuck here.
That is, I cannot see burdening our customer with a forked copy with a monkey patch. Especially, since you are headed (sometime) to V4 support.
NodeJS is not approved for our networks, so that is not going to help.
Obviously, the fall back is to use the Raw REQUESTS library directly to the OData V4 queires.
However, I do not see a version pyodata 1.11 (only 1.10.1)
sorry, my typo. Anyway, in 1.11 do not expect big changes.
documentation at the odata site, which listed your library as V4
Which odata site please? we are listed as V2 only on https://www.odata.org/libraries/
V4 - you see from #39 it is long wanted feature. Sadly, expect no quick release of such feature, even if the V4 support would be first just "v2 equivalent feature set". Not that you could plan your project around such promise.
To soften the burn a bit - if I understand correctly, the server pagination you mentioned was quite isolated PR from the community, see https://github.com/SAP/python-pyodata/pull/188/files . It may be possible to reproduce it from just parsing the JSON grabbed by the Request library; if the _next
is present in the JSON payload, then the url is also provided in the payload.
As a really ugly and last ditch solution - for basically a subset of V4 APIs, which are defined in a way that is 100 percent compatible with odata V2 - I guess we could have a new class under vendors, that would basically cover this problem (jsons are returned with different key than 'd' or without a nested json at all). No other V4 feature would be there, it would have pydoc explicitly describing the use case. No support for V4 APIs that does not work correctly with such vendor would be expected, it is up to user to check the metadata against v2 specification. Such ugly vendor would be easy to deprecate in the future with v4 support added. Some edge cases could be covered by this, until proper V4 support will be part of 1.x / 2.x pyodata version.
Changes to v2 service for "configurable" things that are out of odata V2 specification will definitely not be accepted.
But vendor class, however ugly, it will be part of official release, not a patched fork. If you want to work on pull request for such "hacky" vendor PR, I guess it could be a compromise on functionality and time; proper v4 support is the answer but I cannot provide at the moment a date when it will really be released.
@phanak-sap
I whipped this up and it is doing paged reads of all data at one entity. It also keeps a high water mark in a persistent store, so that next time it only fetches the new data. Data is saved in a JSON file, which is what I need to send to our Data Lake (which will be the next phase).
import requests
import shelve
import json
import datetime
import logging
from http.client import HTTPConnection
ServiceURL = 'https://lms-odata.aternity.com/aternity.odata/v2.0/'
BizEntity = "BUSINESS_ACTIVITIES_RAW"
HighWaterMark = '' # Latest Time Stamp for data
DEBUG = False
PageSize = 20
Session = requests.Session()
# Store - Shelve permanent storage of key/value pairs
def setup():
global HighWaterMark
global Store
Store = shelve.open("settings")
HighWaterMark = Store['highWaterMark']
Session.auth = (Store['username'], Store['password'])
Session.verify = False
if (DEBUG):
HTTPConnection.debuglevel = 1
logging.basicConfig(format='%(asctime)s %(message)s', datefmt='%m/%d/%Y %I:%M:%S %p')
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True
def writer(data):
jData = json.dumps({"data": data}, indent=2)
with open('chunk.json', 'w') as chunkFile:
chunkFile.write(jData)
def flatten(l):
return [item for sublist in l for item in sublist]
def fetch():
allChunks = []
pages = 0
params = f"?$page_size={PageSize}&$orderby=TIMEFRAME asc&$filter=TIMEFRAME gt {HighWaterMark}"
url = ServiceURL + BizEntity + params
while True:
if not url:
break
resp = Session.get(url)
if resp.status_code == 200:
pages += 1
chunk = json.loads(resp.text)
allChunks.append(chunk["value"])
url = chunk.get("@odata.nextLink", False) # Fetch next link
else:
print(f"url: {url}")
exit("OData Call Failed....")
writer(flatten(allChunks))
print(f"PageCount: {pages}")
flat = flatten(allChunks)
if (len(flat) > 0 ):
last = flat[-1]
ts = last.get("TIMEFRAME")
print (f"time: {ts}")
Store['highWaterMark'] = ts
setup()
fetch()
I am using the package python pyodata and getting parse issues when the meta-data is returned from the root ODATA location. I don't own the site, so the ODATA interface is controlled by riverbed software.
The error is (It look like it is looking for the DataService tag.
and my code is: (I tired to tell it that it is a non-standard name space with a custom name space tag.
the start of the metadata file looks like this: