edwindj / cbsodataR

Statistics Netherlands (CBS) OpenData API Client for R
https://edwindj.github.io/cbsodataR
32 stars 12 forks source link

Column labels #3

Closed J535D165 closed 7 years ago

J535D165 commented 7 years ago

Why is there a number behind the colnames? Is this how you receive the data from CBS?

# Get the data (doodsoorzaken)
doodsoorzaken <- get_data('81452NED')
colnames(doodsoorzaken)
 [1] "ID"                                     "Geslacht"                              
 [3] "Leeftijd"                               "Perioden"                              
 [5] "TotaalDodelijkeOngevallen_1"            "TotaalDodelijkeVervoersongevallen_2"   
 [7] "Voetganger_3"                           "Fiets_4"                               
 [9] "BromEnSnorfietsEnBrommobiel_5"          "GemotInvalidenvoertuigScootmobiel_6"   
[11] "Motorfiets_7"                           "Personenauto_8"                        
[13] "BestelautoVrachtauto_9"                 "OverigOnbekend_10"                     
[15] "AccidenteleVal_11"                      "AccidenteleVerdrinking_12"             
[17] "TotaalAccidenteleVergiftiging_13"       "Medicijnen_14"                         
[19] "Drugs_15"                               "Alcohol_16"                            
[21] "OverigOnbekend_17"                      "TotaalOverigeDodelijkeOngevallen_18"   
[23] "MechanischEffect_19"                    "RookVuurEnVlammen_20"                  
[25] "Verstikking_21"                         "OverigInclLaatGevolg_22"               
[27] "TotaalDodelijkeOngevallen_23"           "DodelijkeVervoersongevallen_24"        
[29] "AccidenteleVal_25"                      "AccidenteleVerdrinking_26"             
[31] "AccidenteleVergiftigingInclOpzetOnb_27" "OverigeDodelijkeOngevallen_28"  

I am not so sure what the recode argument does, but is doesn't change anything.

edwindj commented 7 years ago

You are correct: these are the column names I receive from the CBS. The human readable names of the columns are in meta$DataProperties

the recode argument currently recodes category codes into category names. It is on my TODO list that recode also returns "human readable" column names. Note however that these label can be looooonggg.

J535D165 commented 7 years ago

Hello Edwin, thanks for your updates on this point. I like the use_column_title argument. Dropping the numbers after the underscore may be a solution as well. I haven't found a situation with duplicate column names when dropping the numbers. I'm not sure what happens when there are duplicate column names.

A few weeks ago, I made a CBS open data client for Python inspired on this project. Linking back to your R version. https://github.com/J535D165/cbsodata

edwindj commented 7 years ago

Hi Jonathan,

Thanks for your suggestions! Regarding dropping the numbers: most of the times this is not a problem, however:

I noticed (by accident) your python implementation. I was looking at "rijksdriehoek" of Jan van der Laan (who is a close colleague), and noticed your python implementation and there upon your python implementation of cbsodata :-) . I will add a link to README to reference your python version.

Best regards,

Edwin

J535D165 commented 7 years ago

Clear answer. I'm a bit afraid that the numbers will change over the time (different column order?). But it's good to leave it to the end user. And the use_column_title is enough for most users.

In my opinion, the issue can be closed.