Alexander-Barth / NCDatasets.jl

Load and create NetCDF files in Julia
MIT License
147 stars 29 forks source link

add spatial reference #184

Open visr opened 1 year ago

visr commented 1 year ago

I wrote some code to add a coordinate reference system to a NCDataset, that might be useful for others, and perhaps can start a discussion on integrating a form of this into the package if desired.

In the CF conventions projections is discussed here: https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#grid-mappings-and-projections. For a while CF has had it's own CRS system of grid mappings identified by grid_mapping_name, with examples in https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#appendix-grid-mappings. Now there is also the option to use Well-known text (WKT2). Since we have code in the ecosystem for the latter but not the former, I only used WKT2. Older software may only support the grid_mapping_name, pyproj has code that supports creating these. Additionally I also added the epsg attribute, since MDAL, which is used for loading ustructured grids (UGrid) in QGIS, recognizes that, but not yet WKT2. For raster data, GDAL does recognize WKT2.

I found https://corteva.github.io/rioxarray/stable/getting_started/crs_management.html to be a good reference. Note this the implementation below hardcodes the dummy variable name to spatial_ref, we could make this variable.

using GeoFormatTypes: GeoFormat, EPSG, WellKnownText2

crs_val(crs::Union{WellKnownText2, AbstractString}) = String(crs)
crs_val(crs::Union{EPSG, Integer}) = Int32(crs)

function create_spatial_ref!(
    ds;
    wkt::Union{WellKnownText2,String,Nothing} = nothing,
    epsg::Union{EPSG,Integer,Nothing} = nothing,
)
    attrib = Pair{String,Union{Int32,String}}[]
    if wkt !== nothing
        wkt_val = crs_val(wkt)
        # https://corteva.github.io/rioxarray/stable/getting_started/crs_management.html
        # Only write the CF convention version for now, can enable more if that allows it
        # to work with more applications.
        # push!(attrib, "spatial_ref" => wkt_val)
        push!(attrib, "crs_wkt" => wkt_val)
        # push!(attrib, "crs" => wkt_val)
    end
    if epsg !== nothing
        epsg_val = crs_val(epsg)
        push!(attrib, "epsg" => epsg_val)  # MDAL/QGIS needs this
    end

    defVar(ds, "spatial_ref", Int32(0), (); attrib)
end

function create_spatial_ref!(ds, crs::Union{GeoFormat,Nothing} = nothing)
    if crs === nothing
        # not sure what's wise here, should we add it at all?
        create_spatial_ref!(ds)
    else
        # need to have loaded ArchGDAL for this to work
        wkt = convert(WellKnownText2, crs)
        epsg = convert(EPSG, crs)
        create_spatial_ref!(ds; wkt, epsg)
    end
end

Now to use this, call create_spatial_ref!, and for each variable that uses that projection, add the grid_mapping = "spatial_ref" attribute.

create_spatial_ref!(ds; epsg=28992)
defVar(
    ds,
    "data",
    data,
    ("dim1", "dim2"),
    attrib = Pair{String,String}["grid_mapping" => "spatial_ref"]
)

Note: grid_mapping axis order has to match WKT defined axis order. WKT must be https://www.ogc.org/standards/wkt-crs OGC document 12-063. 1st May 2015.

visr commented 1 year ago

cc @rafaqz

Alexander-Barth commented 1 year ago

I have to say, that I am not knowledgeable at all on this topic. The EPSG codes, WKT and the grid mapping attributes are all redundant way to express the same information? From WKT could you deduce for example the other? Are some representations more flexible than others? (it seems to me that not all reference system that you can express in WKT can also be expressed in EPSG codes).

It seems that GeoFormatTypes.jl does not have any external dependencies. It seems to be a design goal to avoid large dependencies (I have already enough with NetCDF_jll :-)).

So, yes I think it would be a useful functionality and it is fine for me to focus on WKT2. Maybe a PR can put this code in a separate file specific to spatial references?

As in NetCDF you can define variables on different grids, it would be good to be able to use different CRS system in the same file. I guess it would be sufficient to make the variable name "spatial_ref" configurable by the user.

visr commented 1 year ago

The EPSG codes, WKT and the grid mapping attributes are all redundant way to express the same information? From WKT could you deduce for example the other? Are some representations more flexible than others? (it seems to me that not all reference system that you can express in WKT can also be expressed in EPSG codes).

Yes all of that is right. WKT is the most flexible, since you can create any of your own projections and use them. EPSG only applies if that projection is in the database. I'm not really familiar with the grid mapping attributes, it is specific to CF only. You can convert WKT to EPSG, but that would need PROJ.

Indeed GeoFormatTypes.jl is intended to stay light.

As in NetCDF you can define variables on different grids, it would be good to be able to use different CRS system in the same file.

I haven't tried this yet, would be interesting to see if this works in QGIS for example.

rafaqz commented 1 year ago

There are a few obscure things that dont convert from wkt to netcdf grid mapping. But otherwise they're interchangeable.

This is a relevant CF thread about it: https://github.com/cf-convention/cf-conventions/issues/222

Writing grid mapping attributes would be good to have at some point, but the conversion is pretty complicated.

@visr you dont need to access val for Int/string conversion, convert works on all geoformat wrappers.

rafaqz commented 1 year ago

Just thinking it would also be good to also read grid_mapping and add support for GeoInterface.crs for a Dataset and CFVariable. This would return nothing if it wasn't found, and WellKnownText2 if it was.

Alexander-Barth commented 1 year ago

Can somebody prepare a PR?

felixcremer commented 2 months ago

Is this something that should be put into CommonDataModel? See https://github.com/JuliaGeo/CommonDataModel.jl/issues/19

I am currently working on geozarr compatibility and loading ZarrDatasets in Rasters and for that we would need the crs handling in CommonDataModel.

rafaqz commented 2 months ago

CommonDataModel seems the best place now. Its just waiting for someone who needs it enough to write it up...

I can help review if that helps