MakieOrg / GeoMakie.jl

Geographical plotting utilities for Makie.jl
https://geo.makie.org
MIT License
167 stars 24 forks source link

Jeremiahpslewis jpsl/geojson 0.6 #125

Closed SimonDanisch closed 2 years ago

SimonDanisch commented 2 years ago

Fixes https://github.com/MakieOrg/GeoMakie.jl/pull/123

SimonDanisch commented 2 years ago

I'm running into a pretty big performance regression in the geo interface implementation:

using GeoJSON, GeoInterface, GeometryBasics
using Downloads: download

const trait_type_dict = Dict(
    PolygonTrait() => Polygon,
    MultiPolygonTrait() => MultiPolygon,
    LineStringTrait() => LineString,
    PointTrait() => Point,
)

# Convert all geoJSON objects to the appropriate GeometryBasics type based on trait
function geoJSONtraitParse(feature::GeoJSON.Feature)
    geometry = GeoInterface.geometry(feature)
    trait = GeoInterface.trait(geometry)
    if !haskey(trait_type_dict, trait)
        @warn "GeoMakie.geoJSONtraitParse: Unknown geometry type $(GeoInterface.trait(geometry))"
        return nothing
    end
    GEOM = trait_type_dict[trait]
    @show GEOM
    return GeoInterface.convert(GEOM, geometry)
end

function geoJSONtraitParse(featureCollection::GeoJSON.FeatureCollection)
    return [geoJSONtraitParse(f) for f in featureCollection]
end

countries_file = download("https://datahub.io/core/geo-countries/r/countries.geojson")
countries = GeoJSON.read(read(countries_file, String))

feature = first(countries)

geometry = GeoInterface.geometry(feature)
trait = GeoInterface.trait(geometry)
xx = @time GeoInterface.convert(GEOM, geometry)

This takes ~5s to compute for a simple 1500 point polygon. All of the countries take so long, that it seems like it's just frozen. I think we should resolve that first before merging. Also, if I see correctly, the abstract GeoInterface types are gone in favor of having only traits. This is pretty inconvenient, since now we can't just overload the Makie conversion pipeline to work with all packages using GeoInterface, which seems pretty bad considering the purpose of GeoInterface? That would bring us back to depend on any package implementing GeoInterface, to actually support plotting their types.

Does anyone have an idea how to bring back "real" support for any geo package? CC: @jeremiahpslewis @visr @evetion

rafaqz commented 2 years ago

For Plots.jl we added a macro to make plotting support relatively easy, in the GeoInterfacePlots.jl subpackage.

We just run the @enable_geo_plots MyAbstractGeomType for the type you want to have plot recipes.

But yes I can see how it will be more annoying for Makie, this is the downside of using traits. But I'm not sure how we could enforce inheritance from a GeoInterface.jl abstract type, seeing that we want e.g. GeometryBasics types to work in the rest of the interface.

I can look at the performance regression.

SimonDanisch commented 2 years ago

Well, we could still have an optional abstract type to depend on? And packages that can afford to inherit from that, can live an easier life? :D

rafaqz commented 2 years ago

Yeah, I would also kind of like that too. I proposed keeping a default hierarchy at some stage but we ended up not for simplicity. But that doesn't seem to be working out in this case.

Probably also the packages that benefit most from plot recipes are the ones that can inherit from GeoInterface.jl types (GeometryBasics/Meshes etc wont need them).

@evetion @visr what do you think about adding at least AbstractGeometry and possibly the whole tree back to GeoInterface.jl? It should simplify some of our code as well.

jeremiahpslewis commented 2 years ago

@rafaqz Here's an MWE which doesn't depend on GeoMakie. Feel like this should be filed as a GeoJSON performance issue? Or GeometryBasics?

Semi-related...would it make sense to include performance tests in GeoJSON.jl to catch relevant regressions?

using GeoJSON
using Downloads
using GeoInterface
using GeometryBasics

countries = GeoJSON.read(read(Downloads.download("https://datahub.io/core/geo-countries/r/countries.geojson"), String))

@time GeoInterface.convert(MultiPolygon, GeoInterface.geometry(countries[3]));
julia> countries = GeoJSON.read(read(Downloads.download("https://datahub.io/core/geo-countries/r/countries.geojson"), String))
FeatureCollection with 255 Features

julia> @time GeoInterface.convert(MultiPolygon, GeoInterface.geometry(countries[3]));
  7.940734 seconds (2.71 M allocations: 157.558 MiB, 1.77% gc time, 12.60% compilation time)
  [cf35fbd7] GeoInterface v1.0.1
  [61d90e0f] GeoJSON v0.6.1
  [5c1252a2] GeometryBasics v0.4.4
evetion commented 2 years ago

It seems the implementation of GeoInterface convert in GeometryBasics is very slow, due to the nested conversion (MultiPolygon goes to Polygon, then LineString, then Point conversion), at each stage also requesting the subgeometry from GeoJSON. So that's a lot of conversion and I guess some type instabilities also occur in the meantime.

On the GeoJSON side things seem fast enough (here given with compile times included).

julia> @time GeoInterface.geometry(countries[3])
  0.000019 seconds (12 allocations: 1.250 KiB)
MultiPolygon

julia> @time GeoInterface.coordinates(GeoInterface.geometry(countries[3]))
  0.047426 seconds (208.72 k allocations: 11.183 MiB, 86.95% compilation time)
3-element Vector{JSON3.Array}:
 JSON3.Array[JSON3.Array[[11.73751945100014, -16.692577982999836], [11.738506688000115, -16.70582207599999], [11.71579354500011, -16.685006365999968], [11.701973057000117, -16.65567360999991], [11.669394143000147, -16.557209578999945], [11.67136

julia> @time collect(GeoInterface.getgeom(GeoInterface.geometry(countries[3])))
  0.030355 seconds (215.86 k allocations: 11.546 MiB, 99.29% compilation time)
3-element Vector{GeoJSON.Polygon{NamedTuple{(:type, :coordinates), Tuple{String, JSON3.Array{JSON3.Array, Base.CodeUnits{UInt8, String}, SubArray{UInt64, 1, Vector{UInt64}, Tuple{UnitRange{Int64}}, true}}}}}}:
 Polygon
 Polygon
 Polygon
evetion commented 2 years ago

It's actually a bit of both, it seems, the JSON3.Array is also not helping.

Changing this in GeometryBasics nets you:

 function GeoInterface.convert(::Type{LineString}, type::LineStringTrait, geom)
     dim = Int(ncoord(geom))
-    return LineString([Point{dim, Float64}(GeoInterface.coordinates(p)) for p in getgeom(geom)])
+    return LineString(Point{dim,Float64}.(copy.(GeoInterface.coordinates(geom))))
 end
julia> @time GeoInterface.convert(MultiPolygon, MultiPolygonTrait(), GeoInterface.geometry(countries[3]));
  0.016911 seconds (19.31 k allocations: 1004.312 KiB)

So we might want to see what adding copy does to the GeoInterface implementation over at GeoJSON.

SimonDanisch commented 2 years ago

Great :) I'll merge and tag this for now, and then we can figure out the improvements in peace ;)

rafaqz commented 2 years ago

Beet me to it. But yes, I knew that lazy loading JSON3 had to have a downside.

So GeoJSON.jl needs some specialised convert methods, and we need to see where materialising the JSON3 string makes more sense than keeping it lazy.