biolab / orange3-geo

:tangerine: :earth_africa: Orange add-on for dealing with geography and geo-location
GNU General Public License v3.0
25 stars 28 forks source link

Propose to decouple geojson files of administrative boundaries from `orange3-geo` add-on and prepare for more flexibility #149

Open fititnt opened 2 years ago

fititnt commented 2 years ago

TL;DR: some way to allow end-user of orange3-geo customize path of the geometries and discussion about (even if not you here) someone like me could make the Choropleth Map (https://orangedatamining.com/widget-catalog/geo/choroplethmap/) accept administrative boundaries codes (instead of latitude/longitude). The rest is mostly the reasoning behind


First, great project! Both for Orange itself and then this extension!

I've been drafting an extension to add HXL-awareness to Orange (idea here https://github.com/biolab/orange3/discussions/6092, non production ready draft here https://github.com/fititnt/orange3-hxl) and turns out that both time and place are likely the most common features every data exchange would have. Then, that's why this discussion.

A bit of context of relevance of allow change underlying layers and emergency response / humanitarian action

The Orange platform, in particular its appeal of easy-to-use interactive data analysis is quite impressive. So I'm aware it is relevant to always make it easy to use for end users.

A TL;DR of how data is exchanged in humanitarian sector

In addition to the fact that adding deeper level of details would make disk size take way too much space (so this alone could be a reason for allow search paths beyond what comes with the add-on) on humanitarian contexts we have another very relevant issue: the administrative borders are often disputed, tools avoid make claim about what is sovereign country (or make it configurable), and often allow redundancy (same area be in two adm0, overlapping borders, etc) . Also, regardless of nationality of the information manager (or think someone taking screenshots from Orange, or explaining to teams in both sides of a conflict how to deal with sensitive data in smarter ways than keep using only Excel), hints on the interface of the map hardcoded for end-user make could make entire tool unusable even for people which disagree with the borders. I mean even stances such as "developers stand with name-of-group-in-conflict" would break humanitarian use for people helping that side.

Very few situations (for example OpenStreetMap approach) of trying an "de facto" point of view (whatever this is possible I  practice) could not work close to either local information managers which need to exchange data with one or more sides which will always have very strong views or, at international level, with donors which are helping same region, but wouldn't be happy the fact something like a city is in the wrong country.

The international vs regional/local dichotomy (aka focused regions)

In general a big difference with tools intended for commercial use (beyond a avoidance of hardcode on software for average end user don't be able to change) "*top level administrative boundaries is that when humanitarian-use geometries exist, they overlap** (and this can't be fixed with software in special at higher zoom levels, because would be a political decision). An "world level" might have more than one way to find the same region, and zoom is unlikely to be perceptive (at least not in the way Orange is used, as data exploration, not like to re-generate data based on the provided geojsons)

Very often the most perfect audience of software such as Orange would be focused on their single country/territory, lots of data about that region. This audience would be pretty okay (or make a lot of sense) to have overlapping borders with other regions. The overlapping might not even be relevant because the way they would present the data would focus on one administrative region (or a world level overview). So, some "problems" might not the a problem the way users would do, but they might need to specify the focused region they want to.

This is another reason that, let's say have geometries for humanitarian use, it's not necessary (or even to not overload servers) expect users to download every region in detail. This can become quite a big deal, because some geometries that go up to level 4 could slow down the rest of the tool.

TL;DR of close to neutral maps for humanitarian use vs educational/commercial

I don't think it would be possible to have a single version attempting to be "neutral". I mean, even if in theory, eventually could be possible leave all geometries 100% outside of orange3-geo, the way people might use may not only vary by level of detail, but some features like region names of "international" conventions (aka "UN-like versions''), might be overlong (in some cases they also state that they're part of the sovereign state and people might complain that they are accustomed with names that appear in commercial-like maps, not what would appear on UN Term).

Another argument to not be viable to have a single version is licensing. The ones shipped with orange3-geo are flexible (naturalearthdata is public domain) but the ones for humanitarian use might be restrictive, so I think no less than 2 "general versions" would be relevant.

One place with have more versions (including humanitarian / other, and also the idea of "worldview" of administrative boundaries) is https://fieldmaps.io/maps and https://fieldmaps.io/data.

The point of this issue

With all the context said, here my 3 ponts

1. Some strategy to allow configure where the geometries are

Currently, I think the way to distribute the geometries for users is to commit them as part of the add-on (directory here <>). I'm not as sure on how (from the user point of view) could be such a configuration, but the ideal case would somewhat allow at least search on different directory. Maybe like to use files on existing directory with lower priority.

How these files could be shipped, is a different story. I've just learned how to create an Orange3 add-on, but this point here could become an "add-on of add-on", where the add-on would pretty much only have geometries. However, some sort of strategy or allow search additional path could already allow for testing in production.

2. Additional way beyond latitude/longitude (e.g administrative boundaries standard codes)

Looking at the source code, the geometries even already have some codes, so almost could do it. Considering 3, next point, I'm already mentioning it as sometuing I'm interested to see how we could do it.

Not yet the focus on this issue, but similar to the problem here https://github.com/biolab/orange3-geo/issues/118 (with the use of latitude and longitude to infer the region) data exchange with information already likely to use some standards codes (likely PCodes with alternative to other sub-national ones, which tend to be attached to some sort of prefix by country/territory). Such codes almost in every case some provider will help with the geometries, and two codes can represent the same geometry (like a city) but political entity. The way match or convert between then can be done with other steps before orange3-geo, but the latitude/longitude algorithm (while I still think can keep because still very useful) eventually could allow match specific code on the input data (which the geometries would also contain it) directly on orange3-geo.

This point I'm not as sure from the user interface point of view, but with some discussion or feedback, I think I could implement it, even if in some hackish way on orange3-hxl (copy pasting your code) and then later make it directly on this add-on.

3. (Outside, but likely relevant later) Server side provider of ready-to-use geometries be decoupled of Orange add-ons releases

Just a quick ping with @UGA-ITOSHumanitarianGIS, @maxmalynowsky (e.g @fieldmaps), @DanRunfola (e.g. wmgeolab/geoBoundaries). Not really necessary feedback now, but might later (e.g. the geojsons be fetched from you, without additional intermediaries).

Most of the time (and I think even for emergency response) the geometries tend to change only once per year and might have near no changes or be compatible with older versions. But under special circumstances (far more common to emergency response) the change needs users to update fast. Also, in special for CODs, in addition to geometries, users have other data (in special population statistics attached to the P-Codes, but for regions without focused humanitarian response it could be using other codes in a way to be compatible with other geometry providers) already ready to be compatible. The argument here is that while "most of the time" it does not change as often, this might lead end-users to not update as often or, if depend of we fetch sources and process it, unless fully automated, it could make users waiting or have a large audience of people in the frontline having to edit files manually.

In any case, while with you here I'm more focused on point 1 and 2 and make the data decoupled from package releases, later I might talk with other people to keep it at least the some regions updated directly from them. For sake of simplifying the implementation of Orange add-on, the geojsons need to have some extra metadata, so generic conversors from shapefile would not be exactly what we need, but anyway, the conversor scripts could be prepared to be close to the providers.

Why the idea of having external groups focused only on the geometries is relevant

Well, most people when users ask for features on biolab/orange3 tend to ask then to submit patches. Also, the documentation on how to contribute is quite impressive. So my point in trying to decouple the data while later proposing this could have some sort of, I don't know, at least index file to search new versions, make sense. Most of their audiences already would be interested in Orange anyway. And issues such as pointed out by @robertcv here https://github.com/biolab/orange3-geo/issues/94 would have more focused attention.

There is other referential data that becomes relevant (and part of this I'm looking there on the extension with HXL) but administrative boundaries is something that would make sense attach directly to this extension than replicate functionality.

DanRunfola commented 2 years ago

A few brief thoughts here that may be of relevance (I am the maintainer over at geoBoundaries).

1) Most administrative boundary datasets are not well suited to inclusion into software packages of any kind due to licensure issues - i.e., humdata has a plethora of licenses with iffy cases. To my knowledge, geoBoundaries and Natural Earth are the only two sources that provide openly usable licenses across their full databases (and, in the case of gB, data lineage).

2) Most boundary repositories don't have reasonable APIs to hit against (humdata and geoboundaries.org are, to my knowledge, the only two exceptions to this). So you would exclude many datasets, for good or bad, unless a user has some means to upload manually.

3) geoBoundaries has a ready-built global surface going down to ADM2 that is updated ~yearly - see https://www.geoboundaries.org/downloadCGAZ.html

One thing my team might be willing to do (pending a hire :) ) is build a pipeline into Orange 3 to automate contributions when our dataset is updated. We already do this with HDX to update the boundary layers they have available. Not sure exactly how this might work, but one option to keep things fresh and moving.

janezd commented 1 year ago

Thank for this very detailed description. It is obvious why this is important in humanitarian situations.

Add-ons usually do not have specific "Preferences", but they could, for instance, add a widget without inputs and outputs, which would contain some general settings for all widgets. One could even write an add-on that would change the data (e.g. files) in another add-on. This goes in direction of add-on of an add-on. It is a bit of a hack, but the situation that you are describing, although very important, is still a niche case.

A question, though, is where to get a different set of boundaries. As @DanRunfola wrote, there are probably just two useful sources.

We are spread rather thin over a large project. This add-on is maintained, but at this very moment not under very active development. If you'd be willing to implement this "add-on of on add-on" or something similar, we'd be happy to assist.

fititnt commented 1 year ago

Great! The idea would be welcome. How be done might require testing.

For now, realistically speaking, let's assume this issue can take at least several months because of the administrative boundaries pipeline (which might have more than one provider).