canada-ca / OS-Advisory_Conseil-SO

Open Source Advisory Board - Conseil consultatif du logiciel libre
Apache License 2.0
38 stars 26 forks source link

On OSS Geospatial #88

Open dbuijs opened 5 years ago

dbuijs commented 5 years ago

Some relevant open source geospatial tools:

Potential concerns that could be discussed a bit more:

dbuijs commented 5 years ago

Also important to note that datasets and tools should not be limited to Canadian geography. Many regulatory applications require geospatial systems that are suitable for global locations.

gcharest commented 5 years ago

@dbuijs Thanks! Let's see if we can integrate these in the reference architecture document.

@goatsweater Do you think we can align them with the current document structure?

goatsweater commented 5 years ago

A bunch of these make sense, and can be added in at one or more places. The reference to GDAL also brought Fiona, Shapely, geopandas, and rasterio to mind. I'll put those in as well.

@dbuijs Will Nominatum work with non-OSM data? The GC will have lots of its own data, and in some cases is legally responsible for defining geospatial boundaries, so we need something that works with our own data as well. Offering geocoding would be a major boost for everyone though; I'd like to see a system in place where we offer an "official" system everyone can rely on using official names for places.

I think addressing the IP issue around postal codes and other data types needs to be worked on with the rules group. To date the best departments will offer is forward sortation areas (FSA), anything better would require Canada Post permission (departments are licensing Canada Post data today).

I'm not sure what you mean about ESRI coordinate reference systems. Departments do use ESRI for mapping, but NRCan (Surveyor General) is the responsible party for defining the official coordinate reference system for Canada. A lot of provinces use their own which is more specific to them, so we'd need a system to work between the various groups. Linking to relevant docs would be useful at certain parts of the requirements, since we do have to adhere to them.

jvanulde commented 5 years ago

@dbuijs @goatsweater are we going to centralize the OSS Geospatial reference architecture somewhere? I see this which has lots of useful information.

BTW - GeoConnections is looking after geospatial policies and standards which are maintained here. We could/should take these as the baseline and expand.

goatsweater commented 5 years ago

@jvanulde We discussed at the last meeting splitting the geo_architecture.md file into multiple files based on the tiers. It's going to get unwieldy if we don't. For now though, adding things to that file is good enough.

We do need to follow the GeoConnections policies and standards. I reached out to them earlier in the week and they are apparently in the midst of a bit of a refresh, but their current page is a solid start. We can't really go wrong if we stick with OGC/IETF standards, and they said they don't prescribe a technology stack. We're on their radar now, so I think they'll steer us a bit if we start to deviate from what they have coming down the pipe.

dbuijs commented 5 years ago

@goatsweater I'm having trouble thinking of geospatial data (just the shapes/locations) that could not easily become OSM data. Obviously the associated meta-data about why we're interested in specific places might have some level of sensitivity, but those details would be kept in separate internal databases such as PostGIS.

The value of Nominatim is really as a drop-in web appliance that can rapidly do both geocoding and reverse-geocoding. Almost all of the boundaries/jurisdictions we care about would already be in the OSM data set.

rwarren2 commented 5 years ago

As a follow up to @dbuijs 's comments, I'd like to add proj4 which is a coordinate conversion toolkit originally written by the USGS that has been ported to pretty much everything (javascript, perl, java, etc...). I note that PROJ4 config strings end up being what everyone uses to define their coordinate system, including ESPG and so on...

I share @dbuijs 's concern about the esri toolchain. Nominatim has it's own internal stack that communicates to a postgis backend (OSM) and hooks for multiple data sources as well as the Overpass API that allows fairly complex queries. I've had my own frustrations with these tools but in terms of consumption, they provide access to a large ecosystem of people and tools for analysis that shouldn't be discounted. It also does not mean that we need to adopt it as an "authoritative tool" as much as a communication tool.

A concern that I have with ISO19115:2003 (or it's latest and greatest for 200 Swiss Francs) is that it has the same problems as the LOC MODS standards: it loves complexity and strings, which limits its use as the size of the system grows.

Forward Sortation Areas still seem to be distributed by Stats Can instead of Canada Post. See this suggestion under review on open.canada.ca for Postal Codes and Canada Post has settled it's lawsuit with geocoder.ca but the terms of its settlement have not been made public and Canada Post still licenses it's postal code data. This makes it problematic since even those are claimed to be restricted by Canada Post.

I am particularly keen on GeoSparql which marries the OGC standard to RDF features, geometries and labels providing a really convenient way to deal with multiple labels and geometries over time. The changing waterfront of Toronto / York over time is a typical use case.

goatsweater commented 5 years ago

I do see how the geography can easily become OSM data, and in fact in many instances it does already. The GC isn't normally the one pushing it to OSM, but there are a lot of individuals who do take the outputs and push them to OSM today. There'd likely be value it making that even easier. My concern isn't really about the geometries though, it's about the attributes. There may be geometries that don't fit the OSM model, but off the top of my head I can't think of one. OSM format is not the native format of any GC dataset I've seen though, and so it would mean a constant reformatting of data just to let it be queried by a geocoder. I think we may have bigger bang for buck by adding a new backend to something like Pelias, but that's just spitballing.

FSA is distributed by StatCan, and NRCan has a geocoder that will geocode to FSA as well. It's always licensed from Canada Post though. I doubt those are the only two departments paying for it as well.

I'm right there with you on being locked into the ESRI toolchain. I can't read my own data without paying the license, which is why something like this is so valuable.

rwarren2 commented 5 years ago

I agree that the geometries are straightforward and that the attributes are going to be where the larger issues show up. Most of the GC sets I've seen are relational databases somehow attached to the DBF attributes of the shapefile.

I'm not suggesting we add to the OSM as much as run our own instance as a means of pushing data out of GC. which conveniently lets us support that entire ecosystem through APIs people already know and generate a base map according to an official "GC view". If people decide to cross-walk it to OSM proper, then I think it's a win since the attribute will be pre-populated according to whatever spec the GC thinks works.

That said, OSM isn't always ideal and I would look forward to multiple solutions being available.

Is there a section on attribute metadata repository?

dbuijs commented 5 years ago

I like the idea of Pelias, but we've been burned by third party plugins to Elasticsearch. Elastic just gets updated too fast, with breaking changes every major release.

Specifically, Kibana has geohashing baked in, which gets you pretty close.

Separately I'm very disappointed that Canada Post is still selling postal codes and that some Federal Departments are still paying for them. Almost (but not quite) annoyed enough to standup a mapping between census tracts and postal codes.

goatsweater commented 5 years ago

@rwarren2 I misunderstood about implementing the OSM model. Having our own internal OSM could work, although getting people to agree to the data model might be tricky. It would make using tools a lot easier though, since a lot of things want to interact with the OSM infrastructure.

I don't think we have anything for metadata yet, other than the part that says it needs to have a CSW in the app tier. Having tools for managing metadata is quite valuable, as the GC has a standard we need to follow. Making it easier to do that will surely be a win.

goatsweater commented 5 years ago

@dbuijs there are other people that I know are trying to get organizations away from using postal codes and start using census geography for their work. They make the very valid point that this also makes comparing against demographics easier, since the demographics are already tied to census geometry. Making this even easier for everyone through APIs would certainly help these organizations in adopting this strategy.

jvanulde commented 5 years ago

@goatsweater we have a metadata management tool that implements the GC standards already. Currently hosted here: https://gccode.ssc-spc.gc.ca/federal-geospatial-platform/fgp-geonetwork. Based on GeoNetwork Open Source. There are several departments that are working on the same code-base including DFO, NRCan, and ECCC. @rwarren2 regarding OSM I believe that we already contribute base map data. We routinely provide data to private sector companies when asked for it. We have national basemap services already (https://www.nrcan.gc.ca/earth-sciences/geography/topographic-information) that are provided through Open Standard services (i.e. WMS) here: https://www.nrcan.gc.ca/earth-sciences/geography/topographic-information/web-services/17216

gcharest commented 5 years ago

Just wanted to note that I've merged the PR #91 and I'm cleaning up lint right now.

That's some really well great and detailed information! If we have potential contributors that are not comfortable to work with GitHub, please let us know so that we can support additional input.