chrisvwn / Rnightlights

R package to extract data from satellite nightlights.
GNU General Public License v3.0
47 stars 14 forks source link

Duplicate observations with distinct light values #62

Closed DavidRMcCoy closed 4 years ago

DavidRMcCoy commented 4 years ago

Great work with this package.

We are extracting mean annual light readings for Brazilian municipalities as a proxy for economic development. We were somewhat successful combining the light data with other data that includes the IBGE codes (national municipality id code in public data), using the states and municipality names. In doing so, we noticed that there are at least 9 municipalities that have alternative spellings in the data (both spellings can't exist in the same state), and these alternative spellings have their own distinct values for light and area. It is unclear which is correct. Is there a way to choose?

Here is the list of the cities with multiple entries, in case that helps.

The format is IBGE_code: spelling1 = spelling2

1504505: Melgaco = Melgaço 2112506: Tutoia = Tutóia 2902302: Aratuipe = Aratuípe 2908309: Conceicao do Almeida = Conceição do Almeida 3155504: Rio Paranaiba = Rio Paranaiba = Rio Paranaíba 3538907: Pirajui = Pirajuí 4101101: Andira = Andirá 4110607: Iporá = Iporã 4302154: Boa Vista das Misses = Boa Vista das Missões

This is the code we used to extract:

BRA_lights <- Rnightlights:::getCtryNlData(ctryCode = "BRA", admLevel = "gadm36_BRA_2", nlTypes = "VIIRS.M", nlPeriods = Rnightlights:::nlRange("201301", "201812"), ignoreMissing = FALSE)

This is what our extracted data looks like before trying to merge/join it with the rest of the data:

brazilian_municipalities.xlsx

chrisvwn commented 4 years ago

Thank you!

It does seem strange that such similar names exist in the same region. The duplicates I have seen before have been same names in different regions. The maps are taken from GADM so this is out of our hands. But you could try loading the map in a GIS program like QGIS and visually looking at the duplicates to see if they are anomalies.

I have managed to look at 'Melga�o'/Melgaco and Rio Paranaiba. In both cases they seem to be adjacent polygons. I am not sure if this is an error, so maybe you can try checking if these are the real names otherwise maybe check with the guys over at GADM.

Screenshot from 2020-05-29 12-56-42

chrisvwn commented 4 years ago

Closing the issue for now. Please reopen if you would like to pursue it further.