SymbolixAU / geojsonsf

Conversion between sf and geojson
Other
81 stars 7 forks source link

Round trip fails #30

Closed mpadge closed 5 years ago

mpadge commented 6 years ago

Thanks @SymbolixAU for the awesome work here! This is by way of starting a more intimate exchange between our work, coz there is likely a lot to be gained. In particular, I've got a bunch of strongly aligned code in my dodgr package, which also does full sf construction in the src code. One of this difficulties I had was ensuring round trip conversions worked. In your case, this currently fails

s <- geojson_sf (geo_melbourne) # My former home city!
g <- sf_geojson (s)
identical (g, geo_melbourne) # FALSE
s2 <- geojson_sf (g)
identical (s, s2) # FALSE

One of this issues I had was your current #20 - this aspect of sf construction changed several times between 0.5 -> 0.6, and I gave up trying to chase the moving target and now do only as far as sfc. Full sf requires always retaining the strict ordering of the AGR attributes and stuff like that that is arbitrary internal sf-stuff. So I now only go as far as testing round trips to and from sfc, but these are pretty important in flagging potential inconsistency with future sf changes.

Any insights into why the round trip fails here?

SymbolixAU commented 6 years ago

Thanks for the kind words! I've also poked around dodgr and will be making use of it in some upcoming projects. (I attended your UseR talk btw)

The first round-trip is because I don't maintain column order (by design, for speed). See the first property in the respective outputs

library(geojsonsf)
s <- geojson_sf (geo_melbourne) # My former home city!
g <- sf_geojson (s)
identical (g, geo_melbourne) # FALSE
s2 <- geojson_sf (g)
identical (s, s2) # FALSE

str( geo_melbourne )
# chr "{\"type\":\"FeatureCollection\",\"features\":[{\"type\":\"Feature\",\"properties\":{\"SA2_NAME\":\"Abbotsford\""| __truncated__
str( g )
# chr "{\"type\":\"FeatureCollection\",\"features\":[{\"type\":\"Feature\",\"properties\":{\"AREASQKM\":1.7405,\"SA2_N"| __truncated__

However, going back to s2 we can see all the values in the columns are in fact still the same.

all(s$AREASQKM == s2$AREASQKM)
# [1] TRUE
all(s$SA2_NAME == s2$SA2_NAME)
# [1] TRUE
all(s$SA3_NAME == s2$SA3_NAME)
# [1] TRUE
all(s$fillColor == s2$fillColor)
# [1] TRUE
all(s$polygonId == s2$polygonId)
# [1] TRUE
all(s$strokeColor == s2$strokeColor)
# [1] TRUE
all(s$strokeWeight == s2$strokeWeight)
# [1] TRUE

Although this is not true for the geometry column. I haven't chased the differences in geometries to any great detail; but all the tests I do suggest it's a precision or rounding error.

all(round(s$geometry[[1]][[1]], 6) == round(s2$geometry[[1]][[1]], 6))
# [1] TRUE

all(round(s$geometry[[1]][[1]], 7) == round(s2$geometry[[1]][[1]], 7))
# [1] FALSE

And the plots 'look' identical

plot(s)

screen shot 2018-09-19 at 6 48 20 pm

plot(s2)

screen shot 2018-09-19 at 6 48 53 pm


The follow-up question then becomes, should I be concerned with the differences in geometries? Happy to take your thoughts on this too.


I feel your pain when constructing the full sf object, but I don't think I can avoid it as GeoJSON allows the properties, so I have to handle it. I just need to keep an eye on sf updates.

mpadge commented 6 years ago

The non-identical round-trip back to geojson is fair enough, and I suspected that. For dodgr I use loads of std::unordered_map objects to maintain indices so everything can be reassembled in the same order. Those are effectively cost-free and should impose no efficiency penalty at all. (I also noticed you use a few std::maps which look like they could maybe be replaced by more efficient unordered_map equivalents?)

The difference in geometries is more interesting - i'll check it out and get back to you on that.

Finally, sorry that you attended my worst-ever public performance, and a shame we didn't get a chance to chat afterwards. Hopefully that'll happen one day soon!

SymbolixAU commented 6 years ago

could maybe be replaced by more efficient unordered_map equivalents

Thanks for the tip - I'll have a go at implementing them in the near future

Hopefully that'll happen one day soon!

yes, absolutely!

SymbolixAU commented 5 years ago

I was hunting the unordered columns over here (where I've implemented unordered_set in place of set)

I don't think it's something I can completely solve, so I'm going to leave it. And as long as these tests pass I'm OK with it.

js <- '{"type":"FeatureCollection","features":[
{"type":"Feature","properties":{"id":3.0},"geometry":{"type":"MultiPoint","coordinates":[[0.0,0.0],[1.1,1.1]]}},
{"type":"Feature","properties":{"id":1.0},"geometry":null},
{"type":"Feature","properties":{"id":2.0},"geometry":{"type":"Point","coordinates":[0.0,0.0]}}]}'

sf <- geojson_sf( js )
js2 <- sf_geojson( sf )
expect_true( as.character( js2 ) == gsub("\\t|\\r|\\n", "", js) )
sf2 <- geojson_sf( js2 )
expect_equal(sf, sf2)

s <- geojson_sf( geo_melbourne )
g <- sf_geojson( s )

#identical(as.character( g ), geo_melbourne ) # FALSE
s2 <- geojson_sf( g )
#identical (s, s2) # FALSE

expect_true( all( names(s) == names(s2) ) )
expect_true( all(s$AREASQKM == s2$AREASQKM) )
expect_true( all(s$SA2_NAME == s2$SA2_NAME) )
expect_true( all(s$SA3_NAME == s2$SA3_NAME) )
expect_true( all(s$fillColor == s2$fillColor) )
expect_true( all(s$polygonId == s2$polygonId) )
expect_true( all(s$strokeColor == s2$strokeColor) )
expect_true( all(s$strokeWeight == s2$strokeWeight) )
SymbolixAU commented 5 years ago

And given the latest updates the rounding appears more accurate too

all(round(s$geometry[[1]][[1]], 3) == round(s2$geometry[[1]][[1]], 3))
# [1] TRUE
all(round(s$geometry[[1]][[1]], 4) == round(s2$geometry[[1]][[1]], 4))
# [1] TRUE
all(round(s$geometry[[1]][[1]], 5) == round(s2$geometry[[1]][[1]], 5))
# [1] TRUE
all(round(s$geometry[[1]][[1]], 6) == round(s2$geometry[[1]][[1]], 6))
# [1] TRUE
all(round(s$geometry[[1]][[1]], 7) == round(s2$geometry[[1]][[1]], 7))
# [1] TRUE