dr-leo / pandaSDMX

Python interface to SDMX
Apache License 2.0
127 stars 59 forks source link

AttributeError while parsing StructureSpecificData #197

Closed wolkiewiczk closed 2 years ago

wolkiewiczk commented 3 years ago

Hi, I downloaded xml file with data from Fusion Registry that we use in our company for development purposes. I tried to open it with read_sdmx function as follows:

from pandasdmx import read_sdmx
sdmx = read_sdmx('/home/wolkiewicz/fruit.xml')

What I've got was AttributeError, here is the traceback and some printed content:

2020-12-16 15:14:45,834 pandasdmx.reader.sdmxml - WARNING: sdmxml.Reader got no dsd=… argument for StructureSpecificData

--- SS without DSD ---
[True]

--- DataSetClass ---
[<class 'pandasdmx.model.StructureSpecificDataSet'>]

--- <class 'pandasdmx.message.DataMessage'> ---
[<pandasdmx.DataMessage>
  <Header>
    extracted: '2020-12-16T14:14:09'
    id: 'IREF280407'
    prepared: '2020-12-16T14:14:09+00:00'
    reporting_begin: '2017-01-01T00:00:00'
    reporting_end: '2019-01-01T00:00:00'
    receiver: <Agency ANONYMOUS>
    sender: <Agency GCC_STAT>
    source: 
    test: False
  dataflow: <DataflowDefinition (missing id)>
  observation_dimension: <Dimension TIME_PERIOD>]

--- <class 'pandasdmx.model.DataStructureDefinition'> ---
[<DataStructureDefinition SDMX:FRUIT_RAW(1.0)>, <DataStructureDefinition SDMX:FRUIT_RAW(1.0)>]

--- SDMX_FRUIT_RAW_1_0 ---
[<DataStructureDefinition SDMX:FRUIT_RAW(1.0)>]

--- DataSet ---
[StructureSpecificDataSet(annotations=[], action=None, attrib=DictLike(), valid_from=None, structured_by=<DataStructureDefinition SDMX:FRUIT_RAW(1.0)>, obs=[], series=DictLike(), group=DictLike())]

<Obs xmlns:ss="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/data/structurespecific" xmlns:footer="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/message/footer" xmlns:ns1="urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=SDMX:FRUIT_RAW(1.0):ObsLevelDim:TIME_PERIOD" xmlns:message="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/message" xmlns:common="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/common" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" TIME_PERIOD="2017" OBS_VALUE="412"/>

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~/.virtualenvs/etl/lib/python3.8/site-packages/pandasdmx/reader/sdmxml.py in read_message(self, source, dsd)
    243                     # Parse the element
--> 244                     result =    func(self, element)
    245                     self.push(result)

~/.virtualenvs/etl/lib/python3.8/site-packages/pandasdmx/reader/sdmxml.py in _obs_ss(reader, elem)
   1257     # Extend the DSD if the user failed to provide it
-> 1258     key = dsd.make_key(model.Key, attrib, extend=reader.peek("SS without DSD"))
   1259 

AttributeError: 'NoneType' object has no attribute 'make_key'

The above exception was the direct cause of the following exception:

XMLParseError                             Traceback (most recent call last)
<ipython-input-6-e7dfa2011745> in <module>
----> 1 sdmx = read_sdmx('/home/wolkiewicz/fruit.xml')

~/.virtualenvs/etl/lib/python3.8/site-packages/pandasdmx/reader/__init__.py in read_sdmx(filename_or_obj, format, **kwargs)
    162         return reader().read_message(obj, dsd=dsd)
    163     else:
--> 164         return reader().read_message(obj)
    165 

~/.virtualenvs/etl/lib/python3.8/site-packages/pandasdmx/reader/sdmxml.py in read_message(self, source, dsd)
    252             self._dump()
    253             print(etree.tostring(element, pretty_print=True).decode())
--> 254             raise XMLParseError from exc
    255 
    256         # Parsing complete

XMLParseError: AttributeError: 'NoneType' object has no attribute 'make_key'

I attach the xml file. What I discovered is that when I change a common:StructureUsage tag into common:Structure than it works perfectly. So I am not sure if it is a problem with the file or with your library as I am not that familiar with SDMX format. Error message could be definitly more clear though.

Thank you for help in advance.

fruit.zip

dr-leo commented 3 years ago

I consider this as bug on the data source's side. A header of a DataMessage cannot contain a StructureUsage artifact. Your intuition to replace this with structure is correct. It is a reference to the DSD structuring the dataset.

Pleas re-open this as needed.

wolkiewiczk commented 3 years ago

Thank you for response. I contacted the developers of the data source and they disagreed with your point of view. They claim that the Structure tag can contain three types of tags based on reference type:

  1. Structure tag for DataStructureReferenceType,
  2. StructureUsage tag for DataflowReferenceType,
  3. ProvisionAgreement tag for ProvisionAgreementReferenceType.

So i think that it is a bug on the library side as the DataflowReferenceType is not parsed correctly. I am not sure how the ProvisionAgreementReferenceType works as i did not test it.

Please re-open the issue as I cannot do it myself.

dr-leo commented 3 years ago

Thanks. Your perseverance is much appreciated. I will look into this asap. This needs to be fixed. My resources are very limited at the moment though.

No need to say that a PR would be welcome.

Which data source did you contact? A link would be helpful.

On 07/01/2021, Kacper Wolkiewicz notifications@github.com wrote:

Thank you for response. I contacted the developers of the data source and they disagreed with your point of view. They claim that the Structure tag can contain three types of tags based on reference type:

  1. Structure tag for DataStructureReferenceType,
  2. StructureUsage tag for DataflowReferenceType,
  3. ProvisionAgreement tag for ProvisionAgreementReferenceType.

So i think that it is a bug on the library side as the DataflowReferenceType is not parsed correctly. I am not sure how the ProvisionAgreementReferenceType works as i did not test it.

-- You are receiving this because you modified the open/close state. Reply to this email directly or view it on GitHub: https://github.com/dr-leo/pandaSDMX/issues/197#issuecomment-756055958

wolkiewiczk commented 3 years ago

We are using Fusion Registry version 10.5.10. You can find out more about it here: https://metadatatechnology.com/software/FR10.php

To generate sdmx data file in FusionRegistry you need to click "Web service" > "Data" and the proper form with filters will show up. Unfortunately I cannot give you access to our data store now, but I hope the file that I provided will be enough. If you desperately need the access to data store, please contact me by the private message.

I would appreciate if you reopen the issue so it will not be lost in the closed ones. Github does not allow me to do it myself. Thanks for all your help and I hope the issue will be resolved soon.

dr-leo commented 3 years ago

Thanks. Access to further data won't be necessary. I'll see what I can do.

dr-leo commented 3 years ago

The file does not validate against sdmxmlmessage.xsd. The error_log shows an error in the Structure element of the header. This error does not seem to be related to the AttributeError. Still, I recommend to ensure that the header complies with the XSD.

On a separate note, it is impossible to parse a structure-specific data message without providing a CSD. If you request the data from an agency, pandasdmx attempts to download the CSD on the fly, if not provided by the user. But read_sdmx cannot do this as it does not know which agency to connect to.

That said, pandasdmx should raise a more meaningful exception in such cases. And it should properly parse a correct header, including all the structural metadata provided or referenced in the header.

Please let me know if this helps.

wolkiewiczk commented 3 years ago

I am afraid that I don't understand what do you mean. What is sdmxmlmessage.xsd? Is it an official xsd? Where I can get it from? I also don't know what a CSD is. I would appreciate explanation. Do I understand correctly that you suspect the data to be wrong? I am not sure how should I interpret your response.

dr-leo commented 2 years ago

Pls report if error persists in pandasdmx v1.8.0