Closed kletts closed 4 months ago
Hi Christian @kletts, Thanks very much for your kind comments and for opening this issue. I agree that the problem you've identified needs to be rectified. I also agree with your opinion at the end - better to not coerce to numeric, even if this may create extra work for the user in some circumstances, than to coerce to numeric when that's not appropriate.
I have limited scope to fix this issue right now. I'd be happy to review a PR or otherwise will get to this when I can.
Cool, I've raised a PR for you with a proposed change. I had thought the coercion was in the abs_api_label_data function, but it turns out to be happening first upstream on the raw download by read.csv
Thanks so much @kletts !
This is in master
now, @kletts. Thanks again
Thanks Matt for the prompt update
Thanks for a great package, in combination with the Data Explorer it has massively improved how I work with ABS data.
The labelling function for the read_api coerces number codes to numeric and this appears to result in labels not being applied. This is particularly a problem with ABS classification structures such as ANZSIC where the code
01
with a leading zero refers to a specific classification or in this case the agriculture subdivision of agriculture, forestry and fishing.A reproducible example is from the following extract, labels are missing for industry 01 and 02 but provided for 12. The code value has been converted to 1, 2, 12:
The question worth discussing is should numbers in code lists ever be coerced to numeric. It seems reasonable that in general codes are not numbers, but there are examples where codes are used as numbers by the ABS, for example
unit_mult
:My personal opinion is that the safer practice is to keep all codes as characters, as coercion by the user, where required, is easy to perform but reversing the erroneous conversion is a big and messy job.
Christian