Closed anelda closed 3 years ago
The following facilities are missing from the SF afrihealthsites import because they don't have coordinates in the original Excel spreadsheet:
Thankyou @anelda
You are right, facilities without coordinates are missing from the stored data. This is related to #4 the data are currently stored in the package as a sf
object which cannot hold items without coordinates. I'll look into storing the data as a dataframe instead.
For reference the reproducible code to download and store the data is in the data-raw folder of the package here 👍 https://github.com/afrimapr/afrihealthsites/blob/6432b5ac9aa49a9ec802c6926cb015cb653994be/data-raw/sf_who_sites.R
Thanks! It makes sense that an SF object will not contain observations without coordinates. I notice that the KEMRI data contains 2350 observations without coordinate details.
Maybe it makes more sense for afrihealthsites to import by default as datatable and have a function to convert to sf with very clear indication of the obs that are lost in the conversion? People may want to do non-map related analysis? Or combine with other table-like datasets?
I'm wondering if there's an opportunity here to help people to improve the data and push back to healthsites.io or other sources from the package?
Can you check this now does what you would expect ?
# to return raw dataframe for WHO data including any rows with no coordinates
dfzaf <- afrihealthsites("south africa", datasource='who', plot=FALSE, returnclass='dataframe')
I have kept the default to return as sf because mostly we are interested in doing spatial things.
Also its get's a bit tricky because other sources e.g. healthsites.io from rhealthsites
are already sf.
We can revisit if needed.
This is perfect! Thanks!
I also ran this on the healthsites.io data but it returns a vector for geometry in stead of two columns for lat and long:
> dfzaf_healthsites <- afrihealthsites("south africa", datasource='healthsites', plot=FALSE, returnclass='dataframe')
> select(dfzaf_healthsites, geometry)
Simple feature collection with 2064 features and 0 fields
geometry type: POINT
dimension: XY
bbox: xmin: 17.06561 ymin: -34.59043 xmax: 32.75507 ymax: -22.34141
geographic CRS: WGS 84
# A tibble: 2,064 x 1
geometry
<POINT [°]>
1 (18.84201 -33.97814)
2 (28.15224 -26.16084)
3 (27.92697 -26.10441)
4 (31.03788 -23.92632)
5 (18.50614 -33.86543)
6 (28.267 -25.76763)
7 (25.1132 -30.71224)
Also its get's a bit tricky because other sources e.g. healthsites.io from
rhealthsites
are already sf.
I suppose that's because healthsites.io provide their data as shapefile which means they will only provide data that definitely have lat/long information?
This was fixed by implementing the option to import the data as dataframe. We can probably close this issue
Which dataset KEMRI/WHO
Short description of the error or suggestion When I import the original spreadsheet with read_excel it and filter Country for 'South Africa' there are 4303 observations but when I import the same dataset via afrihealthsites I find 4288 observations.
Suggested actions
I'm trying to figure out what is going on and will report back here.