ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
60 stars 13 forks source link

Geography search from Data Entry returns are incomplete #5063

Closed sharpphyl closed 2 years ago

sharpphyl commented 2 years ago

Describe the bug

When doing single-record data entry, selecting higher geography does not return all records.

To Reproduce Example # 1

In data entry, enter needed term and press tab. Example for Samana (province in Dominican Republic)

image

This returns 0 results.

image

However, a search for Samana in Geography search returns the province.

image

Now we get the Samana province.

image

Also, you can't choose just the Dominican Republic (no province) from the data entry screen, but it's in the higher geography when you search Find Geography.

image

Example #2

Same thing happened later today with Malaita a province in the Solomon Islands.

Expected behavior All available entries should appear.

Screenshots see above

Desktop (please complete the following information): PC/Chrome

Additional context Add any other context about the problem here.

Priority Please assign a priority-label. Unprioritized issues get sent into a black hole of despair.

dustymc commented 2 years ago
Screen Shot 2022-09-15 at 4 38 20 PM

That default is intended to discourage using nonspatial data, which is going to result in some sort of possibly-painful cleanup. (And please keep filing issues, I'd prefer to concentrate on the things that are actually being used.) I added data for that one (and will try to get the rest of the Dominican ASAP), but I also dropped the island - the province includes several smaller islands as well - which might mess with your bulkloading. Let me know if you need any help with that.

sharpphyl commented 2 years ago

Ah, I understand why they weren't showing up.

I'd prefer to concentrate on the things that are actually being used.

The problem is, I have no idea what higher geography/localities are in our backlog, so I can't file an issue until a volunteer says that there's no entry in higher geography (if they search from the data entry screen) for some place they think really exists. This really slows down data entry and can result in inaccurate higher geography being used.

I realize that they can go to Search/Geography but getting data from there into the data entry form is way more convoluted for them than using the "pick" feature from the data entry screen.

Can we hold off on limiting the results during data entry to only places with spacial geography until we have spacial geography for every place in higher geography?

dustymc commented 2 years ago

It's just a default, they can change it.

You know better than anyone that I've been begging and pleading for a solution forever, and I still am. Give me some way to limit geography to things for which I can find spatial data and I'll make this easy (I hope!). Give me ANY defined pathway and I'll try to make it work. Or tell me you can't deal with even the suggestion/guidance yet and I can change the default, but know that from here that feels like trying to dig our way out of a hole (à la Homer Simpson).

Depending on the place and current data and previous transliteration attempts and recent politics and so on, adding spatial can be pretty fast - I was able to complete the Dominican last night - if that changes your outlook in any way.

sharpphyl commented 2 years ago

It's just a default, they can change it.

Ah, I see how they can change this. Thanks.

Give me ANY defined pathway and I'll try to make it work.

I honestly don't understand how this would work. Maybe you could help us understand what this would look like at Office Hours or the Geography Committee meetings or in this issue or maybe am I just not connecting with how it would operate from reading the many other issues where you've been nudging us that way. Is this something that would happen during data entry or just create reports that show if our catalog records contain coordinates are in or out of the spatial geography?

dustymc commented 2 years ago

don't understand how this would work

Well I'm still up for about anything, but if I get to pick I'd go with the "just use GADM (and such)" approach because I don't see anything else that looks viable. That would work out a lot like the Kuwait issue.

  1. We as a community would decide
    • shall we use {source} as geography, and if so
    • how exactly would we map {source} to our geography (That wouldn't be much of a discussion for things like GADM - it's clearly a widely-accepted source, I can't imagine any possible reason we'd not want to use it, and the mapping is clean.)
  2. I would create geography (maybe preemptively) according to (1). (So GADM-based stuff would never have island because GADM doesn't include islands. Other sources would have other data/mappings.)
  3. Clean up, somehow move existing mishmash data to source-based spatial data (mostly my problem, but I'd need some guidance from time to time)
  4. WOOHOO problem solved!

Most office hours are lonely and I can't think of anything more important for the geog committee to tackle, happy to discuss at either or both.

sharpphyl commented 2 years ago

If the next Office Hours (September 27) has a lonely schedule, let's discuss more about how this would work. Lots of questions but we're game to try most anything.

dustymc commented 2 years ago

There's nothing on the agenda.

I think the only potentially controversial (for lack of a better word) would be in selecting and mapping the sources. Things like GADM are (from here, anyway) obvious - everybody uses it, it closely aligns with the "columns" that we and everyone use, I just don't see a problem (but see below). Things like IHO also seem straightforward enough, but mapping may not be. (And maybe that doesn't matter either - whatever we might choose to do would be consistent, and that's something.)

http://geonode.iwlearn.org/layers/Marine_Regions_web_services:eez_land (I have the data, it's used for "megaguam") might be difficult to map to our text data/columns, but as above it would be predictable/consistent and maybe that's all that's really necessary to be usable.

"Customized" sources - someone showing up with townships or county+little slice of eez or WHATEVER - would need evaluated as a Community, which both scares me and seems unlikely enough to be safely ignored, and then would need mapped to the text/columns, which we'll hopefully have worked out with other stuff if that ever becomes real.

Using those things is where I think we'd have the most problems. They're not what we're used to. Someone's going to want a little bit of this, a tiny slice of that, and just a dash of something else all mixed up into one THING, and we as a Community would have to have some sort of answer for them. That will come up going forward, and it will come up as we try to clean existing data. I can't go that alone, it's got to be part of some actionable plan that we're all facing together or its going to result in a bunch of angry users. (Locality attributes has provided a functional solution so far, and maybe "legacy geography that we insist on keeping around for whatever reason" as another locality attribute satisfies whatever need might come up. I can definitely help with that sort of thing, I just can't face the potential mobs alone!)

sharpphyl commented 2 years ago

Thanks for adding this to the agenda. Definitely the simplest and most consistent approach makes sense. We'll try to come prepared. I did stumble over Natural Earth in looking at GADM. It includes coastlines, oceanic areas, lakes, etc. Would it contribute anything to the discussion?

dustymc commented 2 years ago

done?