JuliaGeo / GeoParquet.jl

Geospatial data in Parquet files
MIT License
12 stars 2 forks source link

Add support for `GI.crs` #24

Open eliascarv opened 1 day ago

eliascarv commented 1 day ago

The GeoParquet.read function returns a DataFrame, which I imagine is what prevents the GI.crs function from being implemented, as that would be type piracy. Perhaps returning a wrapper type that implements the Tables.jl interface would be a viable solution?

Here is an example of a file that has CRS, but is not returned by the GI.crs function:

julia> using GeoParquet

julia> using Parquet2

julia> using JSON3

julia> import GeoInterface as GI

julia> df = GeoParquet.read("example.parquet")
5×6 DataFrame
 Row │ pop_est         continent      name                      iso_a3   gdp_md_est  geometry                          
     │ Float64?        String?        String?                   String?  Int64?      WellKnow…                         
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ 889953.0        Oceania        Fiji                      FJI            5496  WellKnownBinary{Geom, Vector{UIn…
  ⋮  │       ⋮               ⋮                   ⋮                 ⋮         ⋮                       ⋮
                                                                                                         4 rows omitted

julia> GI.crs(df)

julia> ds = Parquet2.Dataset("example.parquet");

julia> meta = Parquet2.metadata(ds)["geo"];

julia> json = JSON3.read(meta)
JSON3.Object{Base.CodeUnits{UInt8, String}, Vector{UInt64}} with 3 entries:
  :version        => "1.0.0"
  :primary_column => "geometry"
  :columns        => {…

julia> json.columns.geometry.crs
JSON3.Object{Base.CodeUnits{UInt8, String}, SubArray{UInt64, 1, Vector{UInt64}, Tuple{UnitRange{Int64}}, true}} with 9 entries:
  Symbol("\$schema") => "https://proj.org/schemas/v0.6/projjson.schema.json"
  :type              => "GeographicCRS"
  :name              => "WGS 84 (CRS84)"
  :datum_ensemble    => {…
  :coordinate_system => {…
  ⋮                  => ⋮

Link to download the example.parquet file: https://github.com/opengeospatial/geoparquet/raw/v1.0.0/examples/example.parquet

rafaqz commented 1 day ago

Yes we are in the middle of fixing this globally using DataAPI metadata. See:

https://github.com/JuliaGeo/GeoInterface.jl/pull/161

Maybe jump on to that issue if you have suggestions, or want to finish the PR with some tests and docs

rafaqz commented 1 day ago

You could also PR here to actually attach metadata to the DataFrame, followilling the standard in that PR

eliascarv commented 1 day ago

Got it, I'll wait for the GeoInterface PR to be merged so I can make a PR adding the CRS to the metadata.

eliascarv commented 1 day ago

It seems that the GeoInterface PR is already well underway, I will make the PR adding the CRS in the metadata

asinghvi17 commented 1 day ago

Yeah the standard is pretty set so feel free to add it. Loading GeoDataFrames will make this just work even now.

visr commented 1 day ago

Regarding the other question by @eliascarv:

Perhaps returning a wrapper type that implements the Tables.jl interface would be a viable solution?

What is the reason to depend on DataFrames.jl for this package? Parquet2.Dataset already implements the Tables interface as well. Such an approach would also be more consistent with GeoJSON.jl and Shapefile.jl right?

rafaqz commented 1 day ago

That sounds like a better approach to me too.

We need the metadata either way but it should be attached to the Dataset