Closed pmayd closed 5 months ago
We should support this for all databases, even Genesis, when it is available
Should always be the first variable that contains this AGS
AGS is the official way to uniquely identify municipalities (Gemeinden) in Germany. It is usually an 8-digit number and can contain leading zeros, so it should be handled using a string datatype.
Subsets of the 8-digits AGS are commonly used to identify larger administrative regions (e.g. Bundesland = leading 2 digits; Landkreise = leading 5 digits). This is also what is used in GENESIS, with the addition of the "DG" code which refers to "all of Germany". I think it makes sense to always add a column with this info.
Here's some background about the way the AGS is constructed: https://datengui.de/statistik-erklaert/ags
Here another explanation from Genesis database: https://www.destatis.de/DE/Themen/Laender-Regionen/Regionales/_inhalt.html
I have checked all three databases and Genesis and Regionalstatistik use the same attribute codes, with Regionalstatistik using more (Gemeinden, Regierungsbezirk, Kreise): ["DLAND", "REGBEZ", "KREISE", "GEMEIN"]
However, Zensus does not use the AGS but the ARS, which can have up to 12 digits and has its own codes:
"GEOBL1", "GEOBL3", "GEODL1", "GEODL3", "GEOGM1", "GEOGM2", "GEOGM3", "GEOLK1", "GEOLK3", "GEORB1", "GEORB3", "GEOVB1", "GEOVB3"
In all databases an all my samples, this regional dimension was always the first attribute after the time attribute so I implemented the logic with this in mind:
It is not always the first column so I changed the logic to find the column instead and us this + 2 columns next to get the information independent of the position
For Regionalstatistik and Zensus (we have to check this!) there is always a unique identifying AGS.
We should make sure that users have access to this information as it is valuable for different use cases using these databases.