CorrelAid / datenguide-python

MIT License
34 stars 7 forks source link

short and long description in get_statistics() not containing correct content #71

Closed awakenting closed 4 years ago

awakenting commented 4 years ago

Description

I wanted to try out the use case about car accidents and when trying to get the relevant statistics, I noticed that the columns short_description and long_description are not filled with content but instead with the code of the according statistic (see below).

What I Did

First I install datenguidepy with

pip install datenguidepy

Then I tried the following steps in a notebook (here together with outputs):

Note how the columns short_description has the same values as the column statistic.

import pandas as pd

from datenguidepy import Query
from datenguidepy.query_helper import get_all_regions, get_statistics
import datenguidepy
datenguidepy.__version__
'0.1.1'
pd.__version__
'0.25.1'
!python --version
Python 3.7.4
region_codes = get_all_regions().query('level == "nuts1"').name
region_codes
id
10                  Saarland
11                    Berlin
12               Brandenburg
13    Mecklenburg-Vorpommern
14                   Sachsen
15            Sachsen-Anhalt
16                 Thüringen
01        Schleswig-Holstein
02                   Hamburg
03             Niedersachsen
04                    Bremen
05       Nordrhein-Westfalen
06                    Hessen
07           Rheinland-Pfalz
08         Baden-Württemberg
09                    Bayern
Name: name, dtype: object
get_statistics().head()
statistics short_description long_description
0 FLC006 FLC006 **FLC006**\n*aus GENESIS-Statistik "Feststellu...
1 GEM001 GEM001 **GEM001**\n*aus GENESIS-Statistik "Feststellu...
2 BEVZ20 BEVZ20 **BEVZ20**\n*aus GENESIS-Statistik "Zensus 201...
3 BEVZ15 BEVZ15 **BEVZ15**\n*aus GENESIS-Statistik "Zensus 201...
4 BEVZ22 BEVZ22 **BEVZ22**\n*aus GENESIS-Statistik "Zensus 201...
from datenguidepy import get_all_regions, get_statistics
get_statistics().head()
statistics short_description long_description
0 FLC006 FLC006 **FLC006**\n*aus GENESIS-Statistik "Feststellu...
1 GEM001 GEM001 **GEM001**\n*aus GENESIS-Statistik "Feststellu...
2 BEVZ20 BEVZ20 **BEVZ20**\n*aus GENESIS-Statistik "Zensus 201...
3 BEVZ15 BEVZ15 **BEVZ15**\n*aus GENESIS-Statistik "Zensus 201...
4 BEVZ22 BEVZ22 **BEVZ22**\n*aus GENESIS-Statistik "Zensus 201...
KonradUdoHannes commented 4 years ago

Thank you for the feedback.

The issue is due to a change in the API structure on the side of the datenguide project. That change was not backward compatible and caused statistics information essentially be missing in 0.1.1. We fixed this in Version 0.2.0, which so far is only available as the source from the github master branch. The plan is to create the corresponding pypi release within the next week.

KonradUdoHannes commented 4 years ago

We released 0.2.1 on pypi, which fixes the issue. Updating datenguidepy with

pip install -U datenguidepy

should therefore solve the problem. There is a still a separate issue that datenguidepy.__version__ actually does currently not display the correct version number. Nonetheless the pip upgrade will install the newest version which has the statistics description issue fixed.

KonradUdoHannes commented 4 years ago

@awakenting Unfortunately it turned out that a critical bug managed to get past all the quality assurance for 0.2.1, so that build actually does not work. And hence the issue was not really resolved, not with a working pypi releases anyways. When we noticed, we released version 0.2.2 on pypi, which fixes the critical bug still has functioning statistic description. It also displays the package version number correctly again. Although this release should work, as before, further feedback is always welcome.

awakenting commented 3 years ago

Hey @KonradUdoHannes, thank you for notifying me about this. I tried it now with version 0.4.0 and it all works so I can play around a bit with the data :)