chrisvwn / Rnightlights

R package to extract data from satellite nightlights.
GNU General Public License v3.0
47 stars 14 forks source link

Text encoding of radiance readings in the csv #36

Closed nreguera closed 5 years ago

nreguera commented 5 years ago

Hi,

I am trying to clean the data downloaded and I have seen that there are some weird characters in both the name of the columns and in the values:

image

For the name of the columns it´s fine as I can change them easily, but for the values it would take more time. I am wondering if there is any way to avoid this. I have loaded the data in R using the following code:

nl <- read.csv(file="Data/NL_DATA_KHM_ADM4_GADM-3.6.csv", header=TRUE, sep=",", fileEncoding="UTF-8-BOM")

Thanks!

Natxo.

chrisvwn commented 5 years ago

Hi Natxo,

I have a feeling this could be down to an RStudio setting. I have the data looking like this for me:

Screenshot from 2019-07-20 11-39-03

Could you check in the Rstudio Tools -> "Project Options" what you have for encoding? I have mine set to UTF-8.

Screenshot from 2019-07-20 11-20-56

nreguera commented 5 years ago

Hi Chris,

I checked what you said and changed it, but still was not working. Then I checked the csv file and I realized that there the characters are wrong as well. You can see:

image

What could have happened?

chrisvwn commented 5 years ago

Hmm. Could you check the output of sessionInfo()? Particularly the locale: section. For example my locale is:

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
nreguera commented 5 years ago

I have checked and it shows this:

locale: [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 LC_MONETARY=Spanish_Spain.1252 [4] LC_NUMERIC=C LC_TIME=Spanish_Spain.1252

When downloading the rasters and the data I set this parameter to make it work:

Sys.setlocale("LC_TIME", "English")

chrisvwn commented 5 years ago

I think maybe the 1252 is the problem. Actually, please look at this stackoverflow question. Seems Windows-1252 does not have accented characters. Could you try setting locale to UTF8? Try something like: Sys.setlocal(LC_CTYPE="en_US.UTF-8")

nreguera commented 5 years ago

I got the message OS reports request to set locale to "ES.UTF-8" cannot be honored[1] "" Anyway I think is already downloaded so I can´t change it when reading the csv. I will manage it do update the name manually to each province, it shouldn´t be too time consuming. Thanks anyway.

chrisvwn commented 5 years ago

Noted. If you are able to set the UTF-8 in the locale and want to try it you can delete/rename the data file(s) and re-run the commands to get data. If you haven't deleted the outputrasters it should be real quick. Most of the time is usually spent downloading the tiles and cropping/masking them.