ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
60 stars 13 forks source link

Bug Report - taxonomy searches timing out #8103

Open Nicole-Ridgwell-NMMNHS opened 1 month ago

Nicole-Ridgwell-NMMNHS commented 1 month ago

Describe the bug I've noticed more and more that searches involving taxonomy time out. Here is a particularly annoying one from today:

Screenshots image

Capture

I was able to do the search fine without the taxonomy in, but the taxonomy made it mess up. I know since I'm looking for a whole phylum that using Any taxon, ID, common name would probably gum it up (and it did), but I thought limiting it to the one source would work (although not ideal since many of our taxa still need classifications added).

dustymc commented 1 month ago

I think this might be some combination of https://github.com/ArctosDB/internal/issues/330 (there's a huge amount of data involved, hardware might help), documentation (not sure the expensive options are necessary), and https://github.com/orgs/ArctosDB/discussions/6524 (might be better for answering what looks like a very general question).

(And a link would be useful, but I think I found what I need in the error logs.)

Much of the expense seems to be in searching Chronostratigraphy, which digs around in metadata and runs locality_attributes_ms.attribute_value in (select getLocalityAttrsByMeta('Albian')). It should perform better if you can search a direct assertion (and maybe you can't, those data are there for a reason).

Your initial query is definitely way off in timeout land.

I think the taxonomy stuff is all properly indexed and such, and it seems that less information (term=chordata) is generally faster than more (... and term_type is phylum), but I'm almost certainly running into some sort of infrastructure bottleneck - sometimes my queries are running in a reasonable timeframe (~10 seconds - not ideal, but I think reasonable) and sometimes they're not. (And I'm not so sure about the locality attribute metadata cost anymore.)

If possible, using the 'collection assertion' taxonomy (https://arctos.database.museum/search.cfm?guid_prefix=NMMNH%3APaleo&phylum=%3DChordata) should be many times faster than any other option - but is also dangerous because it relies on assertions (and eg maybe the collection that cataloged most of what you want doesn't believe in ranks so isn't writing taxonomy to the cache or something). It seems there's possibly some room for documentation adjustment in there, but I can't quite articulate anything.

Mixing that with your age term - https://arctos.database.museum/search.cfm?guid_prefix=NMMNH%3APaleo&phylum=%3DChordata&attribute_meta_term=Albian - also performs well.

Possibly your timing is just bad, TACC has been having some hardware issues the last few days and that could be involved. I'll pass this up to them.

I don't see anywhere the indexing or operators could be made better, this does feel like we're running out of memory-or-something.

Sorry, that isn't very satisfying....

Nicole-Ridgwell-NMMNHS commented 1 month ago

Thanks for taking a look at this. The 'collection assertion' taxonomy will I think mostly work for what I need for this search. Is it possible we could enable some kind of search within a search, so search on criterion 1, then limit the second search to just what is in those results?