dr-leo / pandaSDMX

Python interface to SDMX
Apache License 2.0
129 stars 60 forks source link

datastructure() and codelist() functions raise XMLParseError in Bundesbank (BBK) Source #217

Closed ngalanin closed 2 years ago

ngalanin commented 3 years ago

Hello, I am not sure why but the datastructure() and codelist() functions raise XMLParseError when i use Bundesbank (BBK) as a source. An the dsd.components.components have empty values. The code is attached as a screenshot. Am I doing something wrong and missing some parameters? Or is something wrong with the XML? Thank you in advance!

Code

Here is the Error Message in datastructure() function:

--- SS without DSD --- [False]

--- <class 'pandasdmx.message.StructureMessage'> --- [

id: 'IREF1628957005970' prepared: '2021-08-14T18:03:25.970000+02:00' receiver: sender: source: test: False] --- --- [] --- Name --- [('en', 'Deutsche Bundesbank, Insurance Corporations and Pension Undertakings Statistics'), ('de', 'Deutsche Bundesbank, Statistik der Versicherungen und Pensionseinrichtungen')] --- --- [, , , , , , , , , , , , ] --- --- [, , , , , , , , , , , , , ] --------------------------------------------------------------------------- ValueError Traceback (most recent call last) ~\anaconda3\lib\site-packages\pandasdmx\reader\sdmxml.py in read_message(self, source, dsd) 247 # Parse the element --> 248 result = func(self, element) 249 self.push(result) ~\anaconda3\lib\site-packages\pandasdmx\reader\sdmxml.py in _cl(reader, elem) 912 --> 913 cl = reader.identifiable(cls, elem, **args) 914 ~\anaconda3\lib\site-packages\pandasdmx\reader\sdmxml.py in identifiable(self, cls, elem, **kwargs) 431 setdefault_attrib(kwargs, elem, "id", "urn", "uri") --> 432 return self.annotable(cls, elem, **kwargs) 433 ~\anaconda3\lib\site-packages\pandasdmx\reader\sdmxml.py in annotable(self, cls, elem, **kwargs) 426 kwargs["annotations"].extend(self.pop_all(model.Annotation)) --> 427 return cls(**kwargs) 428 ~\anaconda3\lib\site-packages\pandasdmx\model.py in __init__(self, *args, **kwargs) 241 if self.id not in (self.urn_group["item_id"] or self.urn_group["id"]): --> 242 raise ValueError(f"ID {self.id} does not match URN {self.urn}") 243 except KeyError: ValueError: ID DimensionDescriptor does not match URN urn:sdmx:org.sdmx.infomodel.datastructure.DimensionDescriptor=BBK:BBK_ACIP(1.0).BBK_ACIP The above exception was the direct cause of the following exception: XMLParseError Traceback (most recent call last) in ----> 1 meta=bb.datastructure() ~\anaconda3\lib\site-packages\pandasdmx\api.py in get(self, resource_type, resource_id, tofile, use_cache, dry_run, **kwargs) 474 475 # Parse the message, using any provided or auto-queried DSD --> 476 msg = reader.read_message(response_content, dsd=kwargs.get("dsd", None)) 477 478 # Store the HTTP response with the message ~\anaconda3\lib\site-packages\pandasdmx\reader\sdmxml.py in read_message(self, source, dsd) 256 self._dump() 257 print(etree.tostring(element, pretty_print=True).decode()) --> 258 raise XMLParseError from exc 259 260 # Parsing complete XMLParseError: ValueError: ID DimensionDescriptor does not match URN urn:sdmx:org.sdmx.infomodel.datastructure.DimensionDescriptor=BBK:BBK_ACIP(1.0).BBK_ACIP **Here is the Error Message in codelist() function:** ![code2](https://user-images.githubusercontent.com/58103657/129452929-6989b1b4-617a-4434-a68b-7fa3d2c89cf1.png) --- SS without DSD --- [False] --- --- [
id: 'IREF1628957531488' prepared: '2021-08-14T18:12:11.488000+02:00' receiver: sender: source: test: False] --- Name --- [('en', 'Collection indicator code list (BBk)'), ('de', 'Erlaubte Angaben für die Umrechnungsart')] --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) ~\anaconda3\lib\site-packages\pandasdmx\reader\sdmxml.py in read_message(self, source, dsd) 247 # Parse the element --> 248 result = func(self, element) 249 self.push(result) ~\anaconda3\lib\site-packages\pandasdmx\reader\sdmxml.py in _itemscheme(reader, elem) 801 --> 802 return reader.maintainable(cls, elem, items=items) 803 ~\anaconda3\lib\site-packages\pandasdmx\reader\sdmxml.py in maintainable(self, cls, elem, **kwargs) 467 # Create a candidate object --> 468 obj = self.nameable(cls, elem, **kwargs) 469 ~\anaconda3\lib\site-packages\pandasdmx\reader\sdmxml.py in nameable(self, cls, elem, **kwargs) 439 """ --> 440 obj = self.identifiable(cls, elem, **kwargs) 441 if elem is not None: ~\anaconda3\lib\site-packages\pandasdmx\reader\sdmxml.py in identifiable(self, cls, elem, **kwargs) 431 setdefault_attrib(kwargs, elem, "id", "urn", "uri") --> 432 return self.annotable(cls, elem, **kwargs) 433 ~\anaconda3\lib\site-packages\pandasdmx\reader\sdmxml.py in annotable(self, cls, elem, **kwargs) 426 kwargs["annotations"].extend(self.pop_all(model.Annotation)) --> 427 return cls(**kwargs) 428 ~\anaconda3\lib\site-packages\pandasdmx\model.py in __init__(self, **kwargs) 386 def __init__(self, **kwargs): --> 387 super().__init__(**kwargs) 388 try: ~\anaconda3\lib\site-packages\pandasdmx\model.py in __init__(self, **kwargs) 336 def __init__(self, **kwargs): --> 337 super().__init__(**kwargs) 338 try: ~\anaconda3\lib\site-packages\pandasdmx\model.py in __init__(self, *args, **kwargs) 237 --> 238 self.urn_group = pandasdmx.urn.match(self.urn) 239 ~\anaconda3\lib\site-packages\pandasdmx\urn.py in match(string) 43 def match(string): ---> 44 return URN.match(string).groupdict() AttributeError: 'NoneType' object has no attribute 'groupdict' The above exception was the direct cause of the following exception: XMLParseError Traceback (most recent call last) in ----> 1 bb.codelist() ~\anaconda3\lib\site-packages\pandasdmx\api.py in get(self, resource_type, resource_id, tofile, use_cache, dry_run, **kwargs) 474 475 # Parse the message, using any provided or auto-queried DSD --> 476 msg = reader.read_message(response_content, dsd=kwargs.get("dsd", None)) 477 478 # Store the HTTP response with the message ~\anaconda3\lib\site-packages\pandasdmx\reader\sdmxml.py in read_message(self, source, dsd) 256 self._dump() 257 print(etree.tostring(element, pretty_print=True).decode()) --> 258 raise XMLParseError from exc 259 260 # Parsing complete XMLParseError: AttributeError: 'NoneType' object has no attribute 'groupdict'
dr-leo commented 3 years ago

That's weird. That said, a data source does not necessarily have to support 'codelist'. 'datastructure' should yield clean xml though. To support BBK, I had to change a couple of things besides the odd stuff you've encountered.

As a first attempt to find a workaround, try a request for dataflow specifying a dataflow ID:

flow_msg = Request('bbk').dataflow()

This should give you the specified dataflow artefact with refereced artefacts such as datastructure definition, codelists etc. without explicitly querying a specified datastructure.

Please let me know if this works for you.

PS: The 10 lines code example in the docs should thus be simplified as well.

On 14/08/2021, ngalanin @.***> wrote:

Hello, I am not sure why but the datastructure() and codelist() functions raise XMLParseError when i use Bundesbank (BBK) as a source. An the dsd.components.components have empty values. The code is attached as a screenshot. Am I doing something wrong and missing some parameters? Or is something wrong with the XML? Thank you in advance!

Code

Here is the Error Message in datastructure() function:

--- SS without DSD ---

[False]

--- <class 'pandasdmx.message.StructureMessage'> ---

[

id: 'IREF1628957005970' prepared: '2021-08-14T18:03:25.970000+02:00' receiver: sender: source: test: False] --- --- [] --- Name --- [('en', 'Deutsche Bundesbank, Insurance Corporations and Pension Undertakings Statistics'), ('de', 'Deutsche Bundesbank, Statistik der Versicherungen und Pensionseinrichtungen')] --- --- [, , , , , , , , , , , , ] --- --- [, , , , , , , , , , , , , ] --------------------------------------------------------------------------- ValueError Traceback (most recent call last) ~\anaconda3\lib\site-packages\pandasdmx\reader\sdmxml.py in read_message(self, source, dsd) 247 # Parse the element --> 248 result = func(self, element) 249 self.push(result) ~\anaconda3\lib\site-packages\pandasdmx\reader\sdmxml.py in _cl(reader, elem) 912 --> 913 cl = reader.identifiable(cls, elem, **args) 914 ~\anaconda3\lib\site-packages\pandasdmx\reader\sdmxml.py in identifiable(self, cls, elem, **kwargs) 431 setdefault_attrib(kwargs, elem, "id", "urn", "uri") --> 432 return self.annotable(cls, elem, **kwargs) 433 ~\anaconda3\lib\site-packages\pandasdmx\reader\sdmxml.py in annotable(self, cls, elem, **kwargs) 426 kwargs["annotations"].extend(self.pop_all(model.Annotation)) --> 427 return cls(**kwargs) 428 ~\anaconda3\lib\site-packages\pandasdmx\model.py in __init__(self, *args, **kwargs) 241 if self.id not in (self.urn_group["item_id"] or self.urn_group["id"]): --> 242 raise ValueError(f"ID {self.id} does not match URN {self.urn}") 243 except KeyError: ValueError: ID DimensionDescriptor does not match URN urn:sdmx:org.sdmx.infomodel.datastructure.DimensionDescriptor=BBK:BBK_ACIP(1.0).BBK_ACIP The above exception was the direct cause of the following exception: XMLParseError Traceback (most recent call last) in ----> 1 meta=bb.datastructure() ~\anaconda3\lib\site-packages\pandasdmx\api.py in get(self, resource_type, resource_id, tofile, use_cache, dry_run, **kwargs) 474 475 # Parse the message, using any provided or auto-queried DSD --> 476 msg = reader.read_message(response_content, dsd=kwargs.get("dsd", None)) 477 478 # Store the HTTP response with the message ~\anaconda3\lib\site-packages\pandasdmx\reader\sdmxml.py in read_message(self, source, dsd) 256 self._dump() 257 print(etree.tostring(element, pretty_print=True).decode()) --> 258 raise XMLParseError from exc 259 260 # Parsing complete XMLParseError: ValueError: ID DimensionDescriptor does not match URN urn:sdmx:org.sdmx.infomodel.datastructure.DimensionDescriptor=BBK:BBK_ACIP(1.0).BBK_ACIP **Here is the Error Message in codelist() function:** ![code2](https://user-images.githubusercontent.com/58103657/129452929-6989b1b4-617a-4434-a68b-7fa3d2c89cf1.png) --- SS without DSD --- [False] --- --- [
id: 'IREF1628957531488' prepared: '2021-08-14T18:12:11.488000+02:00' receiver: sender: source: test: False] --- Name --- [('en', 'Collection indicator code list (BBk)'), ('de', 'Erlaubte Angaben für die Umrechnungsart')] --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) ~\anaconda3\lib\site-packages\pandasdmx\reader\sdmxml.py in read_message(self, source, dsd) 247 # Parse the element --> 248 result = func(self, element) 249 self.push(result) ~\anaconda3\lib\site-packages\pandasdmx\reader\sdmxml.py in _itemscheme(reader, elem) 801 --> 802 return reader.maintainable(cls, elem, items=items) 803 ~\anaconda3\lib\site-packages\pandasdmx\reader\sdmxml.py in maintainable(self, cls, elem, **kwargs) 467 # Create a candidate object --> 468 obj = self.nameable(cls, elem, **kwargs) 469 ~\anaconda3\lib\site-packages\pandasdmx\reader\sdmxml.py in nameable(self, cls, elem, **kwargs) 439 """ --> 440 obj = self.identifiable(cls, elem, **kwargs) 441 if elem is not None: ~\anaconda3\lib\site-packages\pandasdmx\reader\sdmxml.py in identifiable(self, cls, elem, **kwargs) 431 setdefault_attrib(kwargs, elem, "id", "urn", "uri") --> 432 return self.annotable(cls, elem, **kwargs) 433 ~\anaconda3\lib\site-packages\pandasdmx\reader\sdmxml.py in annotable(self, cls, elem, **kwargs) 426 kwargs["annotations"].extend(self.pop_all(model.Annotation)) --> 427 return cls(**kwargs) 428 ~\anaconda3\lib\site-packages\pandasdmx\model.py in __init__(self, **kwargs) 386 def __init__(self, **kwargs): --> 387 super().__init__(**kwargs) 388 try: ~\anaconda3\lib\site-packages\pandasdmx\model.py in __init__(self, **kwargs) 336 def __init__(self, **kwargs): --> 337 super().__init__(**kwargs) 338 try: ~\anaconda3\lib\site-packages\pandasdmx\model.py in __init__(self, *args, **kwargs) 237 --> 238 self.urn_group = pandasdmx.urn.match(self.urn) 239 ~\anaconda3\lib\site-packages\pandasdmx\urn.py in match(string) 43 def match(string): ---> 44 return URN.match(string).groupdict() AttributeError: 'NoneType' object has no attribute 'groupdict' The above exception was the direct cause of the following exception: XMLParseError Traceback (most recent call last) in ----> 1 bb.codelist() ~\anaconda3\lib\site-packages\pandasdmx\api.py in get(self, resource_type, resource_id, tofile, use_cache, dry_run, **kwargs) 474 475 # Parse the message, using any provided or auto-queried DSD --> 476 msg = reader.read_message(response_content, dsd=kwargs.get("dsd", None)) 477 478 # Store the HTTP response with the message ~\anaconda3\lib\site-packages\pandasdmx\reader\sdmxml.py in read_message(self, source, dsd) 256 self._dump() 257 print(etree.tostring(element, pretty_print=True).decode()) --> 258 raise XMLParseError from exc 259 260 # Parsing complete XMLParseError: AttributeError: 'NoneType' object has no attribute 'groupdict' -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dr-leo/pandaSDMX/issues/217
dr-leo commented 2 years ago

Fixed in v1.8.0.