Closed mpadge closed 5 years ago
Thanks for the kind words! I've also poked around dodgr
and will be making use of it in some upcoming projects. (I attended your UseR talk btw)
The first round-trip is because I don't maintain column order (by design, for speed). See the first property
in the respective outputs
library(geojsonsf)
s <- geojson_sf (geo_melbourne) # My former home city!
g <- sf_geojson (s)
identical (g, geo_melbourne) # FALSE
s2 <- geojson_sf (g)
identical (s, s2) # FALSE
str( geo_melbourne )
# chr "{\"type\":\"FeatureCollection\",\"features\":[{\"type\":\"Feature\",\"properties\":{\"SA2_NAME\":\"Abbotsford\""| __truncated__
str( g )
# chr "{\"type\":\"FeatureCollection\",\"features\":[{\"type\":\"Feature\",\"properties\":{\"AREASQKM\":1.7405,\"SA2_N"| __truncated__
However, going back to s2
we can see all the values in the columns are in fact still the same.
all(s$AREASQKM == s2$AREASQKM)
# [1] TRUE
all(s$SA2_NAME == s2$SA2_NAME)
# [1] TRUE
all(s$SA3_NAME == s2$SA3_NAME)
# [1] TRUE
all(s$fillColor == s2$fillColor)
# [1] TRUE
all(s$polygonId == s2$polygonId)
# [1] TRUE
all(s$strokeColor == s2$strokeColor)
# [1] TRUE
all(s$strokeWeight == s2$strokeWeight)
# [1] TRUE
Although this is not true for the geometry
column. I haven't chased the differences in geometries to any great detail; but all the tests I do suggest it's a precision or rounding error.
all(round(s$geometry[[1]][[1]], 6) == round(s2$geometry[[1]][[1]], 6))
# [1] TRUE
all(round(s$geometry[[1]][[1]], 7) == round(s2$geometry[[1]][[1]], 7))
# [1] FALSE
And the plots 'look' identical
plot(s)
plot(s2)
The follow-up question then becomes, should I be concerned with the differences in geometries? Happy to take your thoughts on this too.
I feel your pain when constructing the full sf
object, but I don't think I can avoid it as GeoJSON allows the properties, so I have to handle it. I just need to keep an eye on sf
updates.
The non-identical round-trip back to geojson
is fair enough, and I suspected that. For dodgr
I use loads of std::unordered_map
objects to maintain indices so everything can be reassembled in the same order. Those are effectively cost-free and should impose no efficiency penalty at all. (I also noticed you use a few std::map
s which look like they could maybe be replaced by more efficient unordered_map
equivalents?)
The difference in geometries is more interesting - i'll check it out and get back to you on that.
Finally, sorry that you attended my worst-ever public performance, and a shame we didn't get a chance to chat afterwards. Hopefully that'll happen one day soon!
could maybe be replaced by more efficient unordered_map equivalents
Thanks for the tip - I'll have a go at implementing them in the near future
Hopefully that'll happen one day soon!
yes, absolutely!
I was hunting the unordered columns over here (where I've implemented unordered_set
in place of set
)
I don't think it's something I can completely solve, so I'm going to leave it. And as long as these tests pass I'm OK with it.
js <- '{"type":"FeatureCollection","features":[
{"type":"Feature","properties":{"id":3.0},"geometry":{"type":"MultiPoint","coordinates":[[0.0,0.0],[1.1,1.1]]}},
{"type":"Feature","properties":{"id":1.0},"geometry":null},
{"type":"Feature","properties":{"id":2.0},"geometry":{"type":"Point","coordinates":[0.0,0.0]}}]}'
sf <- geojson_sf( js )
js2 <- sf_geojson( sf )
expect_true( as.character( js2 ) == gsub("\\t|\\r|\\n", "", js) )
sf2 <- geojson_sf( js2 )
expect_equal(sf, sf2)
s <- geojson_sf( geo_melbourne )
g <- sf_geojson( s )
#identical(as.character( g ), geo_melbourne ) # FALSE
s2 <- geojson_sf( g )
#identical (s, s2) # FALSE
expect_true( all( names(s) == names(s2) ) )
expect_true( all(s$AREASQKM == s2$AREASQKM) )
expect_true( all(s$SA2_NAME == s2$SA2_NAME) )
expect_true( all(s$SA3_NAME == s2$SA3_NAME) )
expect_true( all(s$fillColor == s2$fillColor) )
expect_true( all(s$polygonId == s2$polygonId) )
expect_true( all(s$strokeColor == s2$strokeColor) )
expect_true( all(s$strokeWeight == s2$strokeWeight) )
And given the latest updates the rounding appears more accurate too
all(round(s$geometry[[1]][[1]], 3) == round(s2$geometry[[1]][[1]], 3))
# [1] TRUE
all(round(s$geometry[[1]][[1]], 4) == round(s2$geometry[[1]][[1]], 4))
# [1] TRUE
all(round(s$geometry[[1]][[1]], 5) == round(s2$geometry[[1]][[1]], 5))
# [1] TRUE
all(round(s$geometry[[1]][[1]], 6) == round(s2$geometry[[1]][[1]], 6))
# [1] TRUE
all(round(s$geometry[[1]][[1]], 7) == round(s2$geometry[[1]][[1]], 7))
# [1] TRUE
Thanks @SymbolixAU for the awesome work here! This is by way of starting a more intimate exchange between our work, coz there is likely a lot to be gained. In particular, I've got a bunch of strongly aligned code in my
dodgr
package, which also does fullsf
construction in thesrc
code. One of this difficulties I had was ensuring round trip conversions worked. In your case, this currently failsOne of this issues I had was your current #20 - this aspect of
sf
construction changed several times between 0.5 -> 0.6, and I gave up trying to chase the moving target and now do only as far assfc
. Fullsf
requires always retaining the strict ordering of the AGR attributes and stuff like that that is arbitrary internalsf
-stuff. So I now only go as far as testing round trips to and fromsfc
, but these are pretty important in flagging potential inconsistency with futuresf
changes.Any insights into why the round trip fails here?