ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
59 stars 13 forks source link

geography cleanup: England #6062

Closed dustymc closed 1 week ago

dustymc commented 1 year ago

These are in Arctos and do not line up with GADM:

temp_funky_england.csv

These are in GADM but not Arctos - this really needs regenerated after ^^ that is cleaned, but it may help guide ^^ that (which will be all manual) so I'm including it here.

missing_england.csv

TODO:

  1. Go through Arctos, make everything in temp_funky_england match something in GADM
  2. Regenerate missing_england
  3. Hard to say, depends on (2)

@sharpphyl we have GADM2 by your request so - help??

sharpphyl commented 1 year ago

Let me know if this is what you want on the funky_england csv. I highlighted three entries that need to be removed from geography as they are lower level entities.

On the missing_england.csv do you just want the Wikipedia URL?

temp_funky_england.csv

dustymc commented 1 year ago

@sharpphyl I'm not sure what I need, but it must "do what GADM does."

"Bedford is burough within Bedfordshire and should be deleted then replaced by Bedfordshire in geography" does not make sense in that light: GADM contains a thing called "Bedford" and it looks https://arctos.database.museum/place.cfm?action=detail&geog_auth_rec_id=10000959.

Or, I do know what I need, I just don't know how I need it! WHAT is simply to make every existing Arctos entry correspond to a GADM entry. (Then whatever's not already in Arctos can easily be added.)

sharpphyl commented 1 year ago

But when I look at GADM online this is what I see - Bedfordshire but not Bedford.

Screenshot 2023-03-27 at 11 32 16 AM

Is there a mismatch between what codes you get and what I see on the GADM website? If there is a mismatch, which GADM is the one we use to "do what GADM does."

dustymc commented 1 year ago

BLARGH!

which GADM is the one we use

The only one I can use is "data" - https://gadm.org/download_country.html pick UK then Shapefile

Screenshot 2023-03-27 at 11 00 14 AM

I'm not sure what GADM uses for their maps, but I think it's been an older version of their own data when we've managed to find a clue.

genevieve-anderegg commented 1 year ago

Working on the missing_england, will send it in a few hours. You just want the wikipedia links, correct? I'm double checking the names against the shapes on the GADM website

dustymc commented 1 year ago

@genevieve-anderegg I really need the other thing resolved first, once Arctos is clean then any remaining additions should be easy.

It's more than wiki - I need to match Arctos to GADM (and the wiki should just help understand what that one shared THING is).

genevieve-anderegg commented 1 year ago

Ah gotcha. I’ll hold off then and wait for the first thing to be resolved, but I’m happy to help with matching all of this !

dustymc commented 1 year ago

Here's GADM1 and GADM2 as I get them, but without the geometry (which is huge). Hopefully this will be useful in mapping stuff.

temp_gadm2nogeom(1).csv temp_gadm1nogeom(1).csv

gadm2.name_2 is essentially the authoritative shape-name. Anything that's in Arctos will need changed to match, which of course can only be accomplished when there's a spelling variation - this would never be correct if whatever's in Arctos is in fact a reference to a different shape than what's in GADM. For those situations, some sort of migration will be necessary.

Or, if the data I get from GADM are not most current, we could delay this until GADM is refreshed (and hopefully find a way to influence that) - I'd really not like to jump through any unnecessary hoops!

@mkoo thoughts??

Jegelewicz commented 1 year ago

This begs the question for me - when Bedforshire gets combined with Shropshire or their border changes in GADM, how does that affect Arctos, in particular things without coordinates? Will be just change shapes and suddenly some geography may be incorrect? Or are we planning to keep time-stamped GADM? Or maybe I am not thinking about things correctly.....

dustymc commented 1 year ago

border change

Border changes require a migration.

Big-picture, the approach of excluding the things that change most often for most of the world will hopefully mean that's a rare thing.

Specifically here, I'm hoping to get some help from the users who requested this extra layer of complexity, and if we can't find a way through it then I'll probably ask to drop the demonstrably-unmanageable data at some point. (I don't think that's where this is going, but in general SOMEONE needs to understand whatever we agree to manage!!)

time-stamped GADM?

No, The Plan calls for current. (Historic could be a FFF or whatever, but that's not really this.)

sharpphyl commented 1 year ago

While I was downloading shapefiles that I can't read and wouldn't understand anyway, you were uploading the needed csv. Genna and I will work on it to get Arctos to match it.

As for history, can we have any geography that isn't in GADM if it's an historic area that may be helpful for some collections? E.g. Yorkshire is an historic county that's been split into four counties.

sharpphyl commented 1 year ago

Just to make sure we understand, we take the missing list and add the SOURCE AUTHORITY.

We take the "funky" list and compare it to the entire GADM list above and figure out what to delete (like Isles of Scilly) and what just needs a GADM link like Bedford.

Correct?

dustymc commented 1 year ago

an we have any geography that isn't in GADM

Assertable: no.

As a finding aid (and/or whatever): Sure. (And maybe someday something like https://github.com/ArctosDB/arctos/issues/5597 will make them even more magical).

take the missing list and add the SOURCE AUTHORITY.

Yea, but not where I'd start - a bunch of those will be spelling variations or whatever of existing things, those on the funky list.

The funky list may include things that need deleted and things that need synced with GADM.

sharpphyl commented 1 year ago

I think we get it. First delete these two entries from Arctos England geography as noted on the attached csv.

Isles of Scilly - not in GADM - should be Specific Locality Yorkshire - not in GADM - historic area. Note that the GADM link is for York which will be added from the "missing" csv

Next we'll work on the missing ones to add to Arctos.

temp_funky_england.csv

dustymc commented 1 year ago

Isles of Scilly - not in GADM

Yes it is!

Screenshot 2023-03-28 at 8 54 52 AM Screenshot 2023-03-28 at 9 01 37 AM

York which will be added f

THAT is the essence of the problem. https://arctos.database.museum/place.cfm?action=detail&geog_auth_rec_id=10006248 is used by many

Screenshot 2023-03-28 at 8 57 15 AM

and I'd rather not lose that (certainly not until there some clear picture of where this is going). I (clearly!) don't know enough about the place/history to suggest a migration path. Moving all those records to 'England' with an amended specloc and previous geography locality attribute is my fallback, what I'll do if all else fails, but I'd rather find a better way, if there's one to be found.

work on the missing ones to add to Arctos.

That is trivial after the existing are clean.

sharpphyl commented 1 year ago

Yes it is!

Isles of Scilly not on the GADM list online but we've learned that it's meaningless for us to look at that from here on. The list you're using has all the unitary authorities on it which are not on the online GADM list (or on the Wikipedia list of counties)

Yes, Yorkshire is an historic area that was divided up into four counties. Users might be able to figure out which county if they have a more specific locality, but it makes sense to retain it. Not sure how you create spatial geography for it unless you consolidate the four counties.

Capture

dustymc commented 1 year ago

not on the GADM list online

Right. Do you have any sense if what's in the CSV I posted above (and so in GADM's shapefile) is current? I'm not completely opposed to waiting for next release if this is in flux, and I might reluctantly suggest we back out to GADM1 if it's constantly in flux (I think not but ???). I'd rather find a resolution if there is one, but if we can't - ??????????

I don't have tools to stitch things together - the only way I'm going to create shapes is if they're somehow provided to me. (And what I've got for "Yorkshire" at the moment is I suspect incorrect.)

sharpphyl commented 1 year ago

So now there's nothing that needs to be changed in Arctos except to not have any GADM spacial data for Yorkshire. Yes, what you have as the shape for Yorkshire is wrong. It's just York which is a very small city area.

York

There are lot of entries that say "In GADM4.1 maps, missing from data" but are in GADM. Will you magic in the spacial data?

sharpphyl commented 1 year ago

Do you have any sense if what's in the CSV I posted above is current

Not sure which csv you are referencing.

dustymc commented 1 year ago

https://github.com/ArctosDB/arctos/issues/6062#issuecomment-1485690243

Jegelewicz commented 1 year ago

@droberts49 @wellerjes can you guys weigh in? See https://github.com/ArctosDB/arctos/issues/6062#issuecomment-1487203244

sharpphyl commented 1 year ago

Do you have any sense if what's in the CSV I posted above (and so in GADM's shapefile) is current?

We have no knowledge of what the English administrative hierarchy is except what we find online. We found a better list of the units in England at https://en.wikipedia.org/wiki/Metropolitan_and_non-metropolitan_counties_of_England. It shows 83 entities plus the Isles of Scilly and London. The CSV shows only 49 with names which is just a few more than the 45 in Arctos. Your CSV includes Wales and Scotland and Northern Ireland which look to already be in Arctos.

Let's let the others weigh in on what's current and best to use in Arctos before we do any more work on this.

ewommack commented 1 year ago

Here is some official history: https://www.ons.gov.uk/methodology/geography/ukgeographies/administrativegeography/ourchanginggeography/localgovernmentrestructuring

York was created from North Yorkshire in 1996

mkoo commented 9 months ago

I see the issues in GADM's UK polygons now. Part of the problem is that there is more than just 'county' included in the gadm_2 because England has such a crazy history of 'ceremonial' and administrative counties, districts, townships, special 'constituencies' etc (thank the peerage for this crap.. dukes, earls, and whatnot aplenty and left us a legacy tangle!) GADM ended up treating them all equally even though they clearly are not. (eg lots of townships and city districts within a county or shire are separated out although Arctos has not treated municipalities as HG). Wikipedia more or less confirms this and treats the municipalities as part of the larger county for the ones I checked.

GIS will make it easy to reconcile and re-export but we should talk to GADM at the source too.

EG Nottingham/ Nottinghamshire (yes robin hood fame)

Screenshot 2023-12-01 at 12 12 55 PM

everything in yellow contains 'Nottingham' which is both a municipality and county but with clear spatial relationship; some parts are just smaller polygons clearly needing to be merged with a larger.

let me work on this and talk to Hijmans?

genevieve-anderegg commented 9 months ago

Ah I see! Thank you so much for working on this and communicating directly with GADM, Michelle, we really appreciate it!

mkoo commented 9 months ago

After reading this over and checking a few references, seems like we may want to have "https://en.wikipedia.org/wiki/Ceremonial_counties_of_England" as our list for admn_2 for the UK (but the confusion is mainly in England).

FYI. I created a ceremonial county for Gloucester and updated our geography. This conforms with what Arctos had before and is now missing, wikipedia, and is based on GADM. I will see what GADM wants to do about this so no more work for now unless someone files an issue and has an urgent need.

dustymc commented 1 week ago

I am tabling this; it should be re-opened when something happens with https://github.com/ArctosDB/internal/issues/335.