CorrelAid / pystatis

MIT License
9 stars 1 forks source link

Add "AGS" (Amtlicher Gemeindeschlüssel) to Regionaldatenbank and Zensus Tables #108

Closed pmayd closed 5 months ago

pmayd commented 6 months ago

For Regionalstatistik and Zensus (we have to check this!) there is always a unique identifying AGS.

We should make sure that users have access to this information as it is valuable for different use cases using these databases.

pmayd commented 6 months ago

We should support this for all databases, even Genesis, when it is available

pmayd commented 6 months ago

Should always be the first variable that contains this AGS

sjockers commented 6 months ago

AGS is the official way to uniquely identify municipalities (Gemeinden) in Germany. It is usually an 8-digit number and can contain leading zeros, so it should be handled using a string datatype.

Subsets of the 8-digits AGS are commonly used to identify larger administrative regions (e.g. Bundesland = leading 2 digits; Landkreise = leading 5 digits). This is also what is used in GENESIS, with the addition of the "DG" code which refers to "all of Germany". I think it makes sense to always add a column with this info.

Here's some background about the way the AGS is constructed: https://datengui.de/statistik-erklaert/ags

pmayd commented 5 months ago

Here another explanation from Genesis database: https://www.destatis.de/DE/Themen/Laender-Regionen/Regionales/_inhalt.html

I have checked all three databases and Genesis and Regionalstatistik use the same attribute codes, with Regionalstatistik using more (Gemeinden, Regierungsbezirk, Kreise): ["DLAND", "REGBEZ", "KREISE", "GEMEIN"]

However, Zensus does not use the AGS but the ARS, which can have up to 12 digits and has its own codes: "GEOBL1", "GEOBL3", "GEODL1", "GEODL3", "GEOGM1", "GEOGM2", "GEOGM3", "GEOLK1", "GEOLK3", "GEORB1", "GEORB3", "GEOVB1", "GEOVB3"

In all databases an all my samples, this regional dimension was always the first attribute after the time attribute so I implemented the logic with this in mind:

  1. Check if the first attribute code column contains one of these codes
  2. If so, extract this column and add it to the final table
pmayd commented 5 months ago

It is not always the first column so I changed the logic to find the column instead and us this + 2 columns next to get the information independent of the position