CorrelAid / restatis

R API Client for the German Federal Statistical Office Database
https://correlaid.github.io/restatis/
Other
22 stars 3 forks source link

gen_table() fails to parse results in some cases for language = "en" (Zensus 2022 data base) #33

Open yannikbuhl opened 3 months ago

yannikbuhl commented 3 months ago

The parsing of CSVs that are shipped as a ZIP for the Zensus 2022 fails in some cases, leading to a wild mix of values in the wrong columns across the data frame returned. It is not yet clear why exactly. The error only occurs if the parameter language = "en" (which is the default in restatis). The error can be bypassed by setting language = "de". Example:

gen_table("1000A-0000", database = "zensus", language = "en")

Warning:                                                                                                                    
One or more parsing issues, call `problems()` on your data frame for details, e.g.:
  dat <- vroom(...)
  problems(dat) 
yannikbuhl commented 3 months ago

Update: The error occurs on the side of the Zensus 2022 data base API. In some cases, the names of the municipalities just are not shipped, the cell being empty leading to unequal lengths of the CSVs columns, mixing up the values.

For now, it is recommended to use language = de. We will contact the API maintainers to get a fix.

We also might solve the problem by introducing NAs when a row length does not add up to the column length, but really there should not be a case where the row length varies.

yannikbuhl commented 3 months ago

Note: This also happens in case of the table 3000G-1008