Attol8 / istatapi

Python API for ISTAT (The Italian National Institute of Statistics)
https://attol8.github.io/istatapi/
Apache License 2.0
27 stars 8 forks source link

Cannot download 22_315 'DCIS_POPORESBIL1' neither with pandasdmx #27

Open ppfranco opened 3 months ago

ppfranco commented 3 months ago

Hello, scrivo perché mi possiate aiutare a capire se è un problema del dataset o di impostazione del server.

Il dataflow 22_315 non mi risulta scaricabile con pandasdmx per errore 404 né con istatapi, ancora più strano:

>>> pop = istatapi.discovery.DataSet(dataflow_identifier='22_315')

Traceback (most recent call last):

  File ~/Devnos/futuretense/www/env/lib/python3.11/site-packages/IPython/core/interactiveshell.py:3577 in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  Cell In[70],   [line 1](vscode-notebook-cell:?execution_count=70&line=1)
    pop = istatapi.discovery.DataSet(dataflow_identifier='22_315')

  File <string>:5 in __init__

  File ~/Devnos/futuretense/www/env/lib/python3.11/site-packages/istatapi/discovery.py:81 in __post_init__
    self.available_values = self.get_available_values()

  File ~/Devnos/futuretense/www/env/lib/python3.11/site-packages/istatapi/discovery.py:191 in get_available_values
    strip_ns(tree)

  File ~/Devnos/futuretense/www/env/lib/python3.11/site-packages/istatapi/utils.py:18 in strip_ns
    for _, el in tree:

  File /usr/lib/python3.11/xml/etree/ElementTree.py:1251 in iterator
    yield from pullparser.read_events()

  File /usr/lib/python3.11/xml/etree/ElementTree.py:1327 in read_events
    raise event

  File /usr/lib/python3.11/xml/etree/ElementTree.py:1299 in feed
    self._parser.feed(data)

  File <string>
ParseError: syntax error: line 1, column 0

Con pandasdmx tutto ok per l'esplorazione dei dataflow e dsd, però resetitutisce 404 nello scaricare il dataset:

>>> pop_data = istat.data('22_315', key={ 'FREQ': 'M' }, params={ 'startPeriod': '2024-01-01'}, dsd=dsd)

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
Cell In[39], [line 1](vscode-notebook-cell:?execution_count=39&line=1)
----> [1](vscode-notebook-cell:?execution_count=39&line=1) pop_data = istat.data('22_315', key={ 'FREQ': 'M' }, params={ 'startPeriod': '2024-01-01'}, dsd=dsd)

File ~/Devnos/futuretense/www/env/lib/python3.11/site-packages/pandasdmx/api.py:478, in Request.get(self, resource_type, resource_id, tofile, use_cache, dry_run, **kwargs)
    [476](https://file+.vscode-resource.vscode-cdn.net/home/pierpaolo/Devnos/futuretense/www/~/Devnos/futuretense/www/env/lib/python3.11/site-packages/pandasdmx/api.py:476) try:
    [477](https://file+.vscode-resource.vscode-cdn.net/home/pierpaolo/Devnos/futuretense/www/~/Devnos/futuretense/www/env/lib/python3.11/site-packages/pandasdmx/api.py:477)     response = self.session.send(req, timeout=self.timeout)
--> [478](https://file+.vscode-resource.vscode-cdn.net/home/pierpaolo/Devnos/futuretense/www/~/Devnos/futuretense/www/env/lib/python3.11/site-packages/pandasdmx/api.py:478)     response.raise_for_status()
    [479](https://file+.vscode-resource.vscode-cdn.net/home/pierpaolo/Devnos/futuretense/www/~/Devnos/futuretense/www/env/lib/python3.11/site-packages/pandasdmx/api.py:479) except requests.exceptions.ConnectionError as e:
    [480](https://file+.vscode-resource.vscode-cdn.net/home/pierpaolo/Devnos/futuretense/www/~/Devnos/futuretense/www/env/lib/python3.11/site-packages/pandasdmx/api.py:480)     raise e from None

File ~/Devnos/futuretense/www/env/lib/python3.11/site-packages/requests/models.py:1024, in Response.raise_for_status(self)
   [1019](https://file+.vscode-resource.vscode-cdn.net/home/pierpaolo/Devnos/futuretense/www/~/Devnos/futuretense/www/env/lib/python3.11/site-packages/requests/models.py:1019)     http_error_msg = (
   [1020](https://file+.vscode-resource.vscode-cdn.net/home/pierpaolo/Devnos/futuretense/www/~/Devnos/futuretense/www/env/lib/python3.11/site-packages/requests/models.py:1020)         f"{self.status_code} Server Error: {reason} for url: {self.url}"
   [1021](https://file+.vscode-resource.vscode-cdn.net/home/pierpaolo/Devnos/futuretense/www/~/Devnos/futuretense/www/env/lib/python3.11/site-packages/requests/models.py:1021)     )
   [1023](https://file+.vscode-resource.vscode-cdn.net/home/pierpaolo/Devnos/futuretense/www/~/Devnos/futuretense/www/env/lib/python3.11/site-packages/requests/models.py:1023) if http_error_msg:
-> [1024](https://file+.vscode-resource.vscode-cdn.net/home/pierpaolo/Devnos/futuretense/www/~/Devnos/futuretense/www/env/lib/python3.11/site-packages/requests/models.py:1024)     raise HTTPError(http_error_msg, response=self)

HTTPError: 404 Client Error: Not Found for url: https://sdmx.istat.it/SDMXWS/rest/data/22_315/M...?startPeriod=2024-01-01
danieleongari commented 2 months ago

Unfortunately they moved this database behind paywall: http://dati.istat.it/Index.aspx?DataSetCode=DCIS_POPORESBIL1

However, I'm a bit confused because I was able to download the database on June 11: since this issue is before that date and the dataset is currently not available anymore, I wonder if I was just lucky that by some mistake they made it available on that date.

The subscription to OECD iLibrary is pretty expensive: https://issuu.com/oecd.publishing/docs/pricing_list2024?fr=xKAE9_zU1NQ

Another datasets that fall into this category is DCIS_RICSTAT.

@Attol8, I wonder if there is some way to warn the user about this better than

ValueError: No available data found for the requested query (dataset 283_138

if this information about availability can be obtained somewhere from requests.