CorrelAid / restatis

R API Client for the German Federal Statistical Office Database
https://correlaid.github.io/restatis/
Other
22 stars 3 forks source link

Fetch all variables as 'character' format #26

Closed yannikbuhl closed 3 months ago

yannikbuhl commented 9 months ago

As of now, we let {readr} decide the data format based on the csv that we get from the API.

However, it is recommended that we read all variables as 'char', since in the German system of unique identifiers for municipalities, Länder, etc. there will in many cases be a leading zero. It will result in a small workload for the user to convert variables into the format they wish it to be but will be less error prone.

ColdCactus commented 7 months ago

Maybe provide an option that downloads datasets with the AGS value as character and left-paddd with leading zeros?

Or, this is a bit of a hack, but the code could check (some of) the unique values of a numeric variable against the list of valid AGS/GVIS municipality IDs. If >90% of values are valid AGS/GVIS and, conversely, 90% (for country-wide data) of all valid AGS/GVIS codes are in the numerical variable, it seems plausible that this variable is a municipality code. Then (only) this variable could be kept as character and all other ones converted to numeric.

If performance becomes a problem, all variables with decimal values may be excluded from this check - no valid AGS/GVIS has any commas/decimals.

yannikbuhl commented 7 months ago

Thank you very much @ColdCactus for your comments. We'll consider this when implementing a solution. Having a list of valid AGS/GVIS codes introduces some maintenance burden since changes occur more or less frequently, altough it is probably a more elegant solution.