ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
60 stars 13 forks source link

Default specific locality in specimen search results #862

Closed AJLinn closed 1 year ago

AJLinn commented 8 years ago

The UAM:EH specimen records typically have between 1-3 specimen events (e.g., place of manufacture, place of use, place of collection) with sometimes three different localities. It seems that the specific locality that is displayed in the search results is randomly selected from those three events. I request that the default specific locality that is displayed is the locality associated with the "place of manufacture". Likewise, the georeferenced place of manufacture should be what shows up on the map following a search. Finally, this same information should be the locality information displayed at the top of the specimen record.

jldunnum commented 8 years ago

We are trying to work through this same issue with serial sampling of the same individuals through time and across space (i.e. serial blood sampling of Mexican wolves at the various reintroduction program sites). Not only do you just get a single event in search results, but you cannot download or map the other events either.

campmlc commented 8 years ago

So in our case at MSB, we need to be able to search on, map and download ALL specimen events. Is this possible?

On Fri, Apr 8, 2016 at 1:45 PM, jldunnum notifications@github.com wrote:

We are trying to work through this same issue with serial sampling of the same individuals through time and across space (i.e. serial blood sampling of Mexican wolves at the various reintroduction program sites). Not only do you just get a single event in search results, but you cannot download or map the other events either.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/ArctosDB/arctos/issues/862#issuecomment-207574550

dustymc commented 8 years ago

Picking a "favored" event is possible - it's (computationally) expensive but happens asynchronously, so whatever. (It's not random - events with coordinates should float to the top all else being equal, etc. - but it probably looks that way to most users for most specimens!)

The "see all locality" issue is #755. The short version is that "locality data" is a bunch (~100) of columns for every specimen-event, and a specimen can have any number of events. That doesn't fit in anything tabular (results table, download ), and having data in maps/queries (things that can deal with variable cardinality) which can't be seen in the table would be extremely confusing.

jldunnum commented 8 years ago

Maybe we could have a way to mark records that contain multiple events so at least people will know when they see it in the search results and can go deeper if they wish.

dustymc commented 8 years ago

Maybe we could have a way to mark records that contain multiple events so at least people will know when they see it in the search results and can go deeper if they wish.

Yes, that's the core intent of #755 - and if the "marker" contains the data (eg, as JSON - and I have no idea if that's practical until I play with it) then having that available should make it somewhat simpler to go deeper - just unwind into the variable-cardinality format of your choice, or flatten it out into DWC Occurrences (which we already create and could make available), or use the clicky-viewer (if we can figure out how to build one), or whatever.

Or maybe nobody (or nobody without access to the writeSQL tool) would make use of the JSON and a simple "this thing has 48 localities see specimen detail" flag is enough??

Picking a "favored" event is possible - it's (computationally) expensive but happens asynchronously, so whatever.

It turns out the "simple" way is REALLY expensive - a small batch update (500 records) went from ~2 seconds to ~7 minutes, which will be disruptive even as an asynchronous process. I'll keep looking....

dustymc commented 8 years ago

I may have a workable solution to selectively picking the one specimen event that appears in specimenresults + downloads. Priority currently is:

1) event_type=place of manufacture 2) an event linked to a locality with coordinates 3) just grab one of whatever's left

in all cases excluding "unaccepted place of collection."

Other requests?

campmlc commented 8 years ago

By date - earliest and most recent. Can we choose by geographic element, eg state? On Apr 14, 2016 8:39 AM, "dustymc" notifications@github.com wrote:

I may have a workable solution to selectively picking the one specimen event that appears in specimenresults + downloads. Priority currently is:

1) event_type=place of manufacture 2) an event linked to a locality with coordinates 3) just grab one of whatever's left

in all cases excluding "unaccepted place of collection."

Other requests?

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/ArctosDB/arctos/issues/862#issuecomment-209975476

dustymc commented 8 years ago

I was referring to machine behavior - given http://arctos.database.museum/guid/MSB:Mamm:193683, which one of the 5 events is "prioritized" to fit into http://arctos.database.museum/SpecimenResults.cfm?guid=MSB:Mamm:193683? (Current answer: The one with the coordinates, http://arctos.database.museum/guid/MSB:Mamm:193683?seid=593167.)

I don't understand the above comments.

AJLinn commented 8 years ago

Those priorities work for me and seems logical. How does the # 2 priority determine its selection if there is no #1 and multiple events linked to localities with coordinates. Just goes on to #3?

thank you for working on this. It will make a huge difference for our users. Angie

On Apr 14, 2016, at 6:38 AM, dustymc notifications@github.com wrote:

I may have a workable solution to selectively picking the one specimen event that appears in specimenresults + downloads. Priority currently is:

1) event_type=place of manufacture 2) an event linked to a locality with coordinates 3) just grab one of whatever's left

in all cases excluding "unaccepted place of collection."

Other requests?

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/ArctosDB/arctos/issues/862#issuecomment-209975476

Angela J. Linn Senior Collections Manager, Ethnology & History University of Alaska Museum of the North 907 Yukon Drive P.O. Box 756960 Fairbanks, AK 99775-6960 TEL: (907) 474-1828 FAX: (907) 474-5469 www.uaf.edu/museum Accredited by the American Alliance of Museums

Explore our collections: http://www.uaf.edu/museum/collections/ethno/search-collections/


http://akethnogirl.wordpress.com

jldunnum commented 8 years ago

Could use date as the next level of hierarchy within those categories. Earliest event gets priority.

dustymc commented 8 years ago

https://github.com/ArctosDB/DDL/blob/master/functions/getPrioritySpecimenEvent.sql is now experimentally running at prod - it's a bit slower than the previous revision, but the ~15K specimens with a place of manufacture updated in ~10 minutes or so, which seems workable. Adding more logic to the ordering, as long as it doesn't use data outside of specimen_event, collecting_event, and locality, should (!) have a minimal impact on performance, and adjusting the function is simple as long as the input and output parameters don't change.

The function is now finding the earliest event (based on began_date) within the winning category.

http://arctos.database.museum/guid/MSB:Mamm:224771 has a bunch of equivalent events (accepted place of collection, no coordinates) and so....

UAM@ARCTOS> select specimen_event.specimen_event_id,collecting_event.began_date, locality.dec_lat from specimen_event,collecting_event,locality where specimen_event.collecting_event_id=collecting_event.collecting_event_id and collecting_event.locality_id=locality.locality_id and collection_object_id=21760431;

SPECIMEN_EVENT_ID BEGAN_DATE                                DEC_LAT
----------------- ------------------------------------------------------------------ ----------
      2585775 2010-01-01
      2585778 2011-03-29
      2585777 2010-08-30
      2585779 2012-01-04
      2585782 2014-12-18

5 rows selected.

Elapsed: 00:00:00.01
UAM@ARCTOS> select getPrioritySpecimenEvent(21760431) from dual;

GETPRIORITYSPECIMENEVENT(21760431)
----------------------------------
               2585775

1 row selected.

... the earliest is returned, which hopefully won't offend anyone.

Including State would require one more join (to geography), and if there's no Arctos-wide agreement on which state is most important (seems unlikely) then an additional 3 jumps the other way to get at Collection. There are 1488 unique States in Arctos at the moment, which might be enough to have a noticeable impact on the post-query processing as well (especially if collection is a multiplier). So possible, yes, but likely fairly expensive. ("Expense" can be measured in how long it takes an update to appear in the interfaces and is difficult to quantify, but my wild guess is that adding state would be noticeable/disruptive.)

Jegelewicz commented 1 year ago

I am re-opening this because the solution isn't working for me. See the issue referenced above. We need to be able to tell people that more than one event exists in the search results/download.

dustymc commented 1 year ago

more than one event exists

Screenshot 2023-01-06 at 7 44 12 AM

Or to see data,

Screenshot 2023-01-06 at 7 47 35 AM

isn't working for me

I'm closing because I don't think anything else can become actionable from here. Please reopen if you have a solution in mind, or open a discussion if you want to look for a solution.