bug for retrieving dataset 151_914, probably coming from the way time is handled

Batchounet commented 5 months ago

As I could not retrieve filtered data from dataset 151_914 (unemployment rate), I tried retrieval.get_data('151_914') and got an error : Traceback (most recent call last):

Cell In[64], line 1 unemployment_df_new = retrieval.get_data(ds_new)

File ~\anaconda3\Lib\site-packages\istatapi\retrieval.py:29 in get_data df["TIME_PERIOD"] = pd.to_datetime(

File ~\anaconda3\Lib\site-packages\pandas\core\tools\datetimes.py:1108 in to_datetime cache_array = _maybe_cache(arg, format, cache, convert_listlike)

File ~\anaconda3\Lib\site-packages\pandas\core\tools\datetimes.py:254 in _maybe_cache cache_dates = convert_listlike(unique_dates, format)

File ~\anaconda3\Lib\site-packages\pandas\core\tools\datetimes.py:488 in _convert_listlike_datetimes return _array_strptime_with_fallback(arg, name, utc, format, exact, errors)

File ~\anaconda3\Lib\site-packages\pandas\core\tools\datetimes.py:519 in _array_strptime_with_fallback result, timezones = array_strptime(arg, fmt, exact=exact, errors=errors, utc=utc)

File strptime.pyx:534 in pandas._libs.tslibs.strptime.array_strptime

File strptime.pyx:359 in pandas._libs.tslibs.strptime.array_strptime

ValueError: unconverted data remains when parsing with format "%Y": "-Q1", at position 20. You might want to try:

passing format if your strings have a consistent format;
passing format='ISO8601' if your strings are all ISO8601 but not necessarily in exactly the same format;
passing format='mixed', and the format will be inferred for each element individually. You might want to use dayfirst alongside this. It seems that there is a problem from the way time is handled. It worked fine for ds 151_1193, so i guess it bugs when Istat do some weird formatting [edit] : I tried to remove ``` if "TIME_PERIOD" in df.columns: df["TIME_PERIOD"] = pd.to_datetime( df["TIME_PERIOD"].astype(str) ) df = df.sort_values(by=["TIME_PERIOD"])
```
from the retrieval function, it did not solve the issue
```

Batchounet commented 5 months ago

This also do not work, and have no idea why ...

freq = 'A'
    branca_attiv_rev2 = '_T'
    adjustment = 'N'
    val = 'V'
    dccn_coicop_cofog = 'Z'
    edi = '2020M5'
    prodotti1 = 'Z'
    tipprez = 'B'
    tipo_aggr = 'B1GQ_B_W2_S1_R_POP'
    # Load the dataset with national counts 
    ds = load_ds('93_500')

    ds.set_filters(
        freq=freq,
        branca_attiv_rev2=branca_attiv_rev2,
        adjustment=adjustment,
        val=val,
        dccn_coicop_cofog=dccn_coicop_cofog,
        edi=edi,
        prodotti1=prodotti1,
        tipprez=tipprez,
        tipo_aggr=tipo_aggr
    )
    # Set the filters using the filter arguments
    ds.set_filters()
    gdp_per_capita_df = retrieval.get_data(ds)

Batchounet commented 5 months ago

I add a new bug

ds = discovery.DataSet(dataflow_identifier='729_1050') Traceback (most recent call last):

File ~\anaconda3\Lib\site-packages\IPython\core\interactiveshell.py:3505 in run_code exec(code_obj, self.user_global_ns, self.user_ns)

Cell In[114], line 1 ds = discovery.DataSet(dataflow_identifier='729_1050')

File :5 in init

File ~\anaconda3\Lib\site-packages\istatapi\discovery.py:81 in __post_init__ self.available_values = self.get_available_values()

File ~\anaconda3\Lib\site-packages\istatapi\discovery.py:191 in get_available_values strip_ns(tree)

File ~\anaconda3\Lib\site-packages\istatapi\utils.py:18 in stripns for , el in tree:

File ~\anaconda3\Lib\xml\etree\ElementTree.py:1249 in iterator yield from pullparser.read_events()

File ~\anaconda3\Lib\xml\etree\ElementTree.py:1320 in read_events raise event

File ~\anaconda3\Lib\xml\etree\ElementTree.py:1292 in feed self._parser.feed(data)

File ParseError: syntax error: line 1, column 0

Attol8 commented 4 months ago

Hi @Batchounet thanks a lot for raising these issues. Sorry for the late reply. I have made changes in #26 to address these issues.

Dataset 151_914 -> it now works and loads fine Dataset 729_1050 -> the dataset does not exist, and the API returns "No available data found for the requested query" message, which I am now displaying to users of istatapi

Attol8 / istatapi

bug for retrieving dataset 151_914, probably coming from the way time is handled #24