cccs-web / core

CCCS' customized django web application
4 stars 11 forks source link

How can we standardize GIS features across projects? #28

Open pwhipp opened 10 years ago

pwhipp commented 10 years ago

I am also curious about how we get around translations of admin boundaries to English and the mis-matches of organizational hierarchies across countries with regard to their terminology. E.g. 'district' in Russia is like 'state' in America (which also means nation!). I wonder if we can / should define these on a per-country basis?

pwhipp commented 10 years ago

Django supports language translation for all elements so word to word translation is not a problem when we are dealing with different languages. English variations like American and Australian can be handled as different languages.

The best way to think about the hierarchy issue is for us to have a 'canonical' term for each thing - this is essentially arbitrary and can probably reflect the accepted American word for the thing in the majority of cases.

We would use the canonical term in programming and within CCCS, as database field names, for example.

The canonical term can be translated into an appropriate language on presentation.

On input we will use synonyms (multiple things mapping to the same canonical term) - you can see an example of where I put this in place here, lines 238-247 (I wrongly assumed there would be more than just the one synonym column name across the CVs).

Of course this assumes the same underlying hierarchies in the different languages - I don't see any way out of this and we'll just need an agreed (and documented) way any different hierarchies into the one we select as our 'gold standard'.

cccs-ip commented 10 years ago

Hi, Paul, and thanks. I understand where you are coming from, but I am not sure we are yet on the same page. This is not merely a matter of translation, but a matter of standard variations in how words are used in different countries. In my example, the English word "district" in the Russian context has a meaning that is something akin to "state" in the USA. In Indonesia, this correlates roughly to "province". The term "district" in the USA is almost like a county-level term; "district" in Indonesia is sort of like "county" in USA (and not at all at the level of state). This is problematic because the English terms have very specific meanings within each national context, but across nations these specific terms signify very different values in terms of comparable administrative hierarchies.

Also, the 'synonym' you point to in your message need not be retained. We'll need to go through the CVs to resolve where these have crept in and aren't needed. Additionally, the CV template contains many fields that were necessary only to make the Google Spreadsheet work as I needed for one function or the next, but that would be unimportant for our purposes. More soon.

cccs-ip commented 10 years ago

.. one possible solution (but not one i am entirely happy with just yet) might be something like:

admin_level1 admin_level2 admin_level3

cccs-ip commented 10 years ago

hi again, Paul.

I just found this: http://en.wikipedia.org/wiki/Table_of_administrative_divisions_by_country

.. so it appears that 'level1' 'level2' etc might be the only way around this. In the table, you can quickly see how "region" and "province" are flip-flopped for many countries. Also interesting is that "parish" is a top-level designation for a country or two (but at the county / district level in the USA).

My concern with this approach, however, is that there are also synonyms within each level: "state" in USA is at the same level as "commonwealth" (Pennsylvania) and "parish" is like "county", etc. In Xian, China, at the same level (4 or 5 tiers under 'nation') are "sub-district" "town" and "town farm", and below them are village (equivalent to "town" in the USA; where by contrast village / town and city are hierarchically similar but distinguished by population size).

It's still hard for me to come at this from a conceptual standpoint to think about how to enable users to select from an appropriate range of options per-country that map to the right canonical "level".

cccs-ip commented 10 years ago

On a somewhat related issue:

http://en.wikipedia.org/wiki/User:SiBr4/Comparison_of_ISO_and_FIPS_country_codes and http://en.wikipedia.org/wiki/ISO_3166-1

per your suggestion ,we could use the ISO codes as a base, and allow users to have an additional means of adding in a range of 'defunct' state, or otherwise 'disambiguating' in the few rare cases where 1 ISO code corresponds to multiple 'FIPS' countries.

pwhipp commented 10 years ago

CVS

I suggest that we abandon the spreadsheet cvs as soon as possible and maintain and use the POT CV data in the database. This will be much easier to edit and work with and it does not risk the accidental introduction of other synonyms, missing countries etc. I can easily supply additional tools (such as the 'merge' I've already mentioned) as required to clean up the existing data and enhance the edit form as required (it currently uses just the default edit methods which are not always ideal). That said, additional imports are not a problem and the current import behavior is to update the database if matching rows already exist.

Administration levels

level 1/level 2 etc. are fine as canonical terms for our administrative boundaries a. We should probably include countries in this as our top level (level 0).

As I see it we are labeling the administration of a point on the globe. If we put the country at level 0 and have four incremental subsidiary levels, any given point will have a single designation - it need not use all of the levels but would have to fill its administration specification from level 0. We just have to make sure we support enough levels for all countries.

It is possible to create a system allowing an arbitrary number of levels: id, name, in, geom

'in' is an optional 'parent' where the parent poly would be expected to contain the level. A null 'in' would indicate a country (or an error). This is elegant but it faces complexity and performance issues.

To decide on this complex modeling issue, we need to enumerate the uses for the data and then I'd need to review stuff like this: http://en.wikipedia.org/wiki/Hierarchical_and_recursive_queries_in_SQL http://www.postgresql.org/docs/9.3/static/ltree.html http://django-mptt.github.io/django-mptt/

The mptt stuff looks to be spot on at a glance. It will just need a special look up to identify and subsequent translation of each node's level.

We can get something working and evolve it over time to a solution of appropriate detail.

Then we have translation. The translation will handle the cross over of province/region etc.

cccs-ip commented 10 years ago

thanks, Paul.

I am posting a few resources here in the attempt not to lose them:

*http://www.lawa.org/uploadedFiles/LAXDev/Construction_Handbook/Archives/May2012/May%202012%20Archive%20%20%20%20GS%20%20GIS%20Data%20Standard.pdf