ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
60 stars 13 forks source link

[CONTACT] limit on search results #5530

Closed jldunnum closed 1 year ago

jldunnum commented 1 year ago

How do we increase the number of returnable records in the search results? Record limit of 5000 does not allow me to manage my collection. Simply doesn't allow me to do the things I need to do. It also doesn't show how many records would have been returned in a search but weren't due to the limit. The download also only recovers the 5000.

dustymc commented 1 year ago

If this is a few flattenable fields we might be able to cache via https://github.com/ArctosDB/arctos/issues/5460, and that's much cheaper (which allows more rows).

If you have something you don't need you can turn it off to increase rowcount.

I could up the limits for "us" (we who can deal with timeouts) but I suspect that would make things less stable, so I'd like to exhaust the alternatives first.

jldunnum commented 1 year ago

This issue seems like a massive one that needs immediate attention. My understanding was the conversion was going alleviate the timing out issues. How does one know what the magic balance of number or records to number of fields is? If I have to go in and add or remove fields every time I want to do a different search it adds tons of time and complexity to basic management. Often I need to see a lot of fields at once for a lot of records. Bottom line is that the minimum requirement for a collection database is that the manager can fully manage their collection. If I want to bring up my entire collection with every field I should be able to do that. What needs to happen to get to that point.

DerekSikes commented 1 year ago

Jonathan,

This will allow one to download all the Arctos data of your collection from the FLAT table in Arctos (which is most of the key data). Perhaps useful to have the data handy when away from the internet... or as a backup for disaster preparedness.... or to import into another database for faster queries etc.

Go to Reports/Services -> Write SQL

enter:

SELECT * from flat where guid_prefix='UAM:Ento' limit 1000

but change guid_prefix to your own & limit to just over the # of records in collection (eg 350000)

Note that 350,000 records will be a 1.3 GB file

select download as csv!

-Derek

On Wed, Jan 25, 2023 at 7:44 AM Jonathan Dunnum @.***> wrote:

This issue seems like a massive one that needs immediate attention. My understanding was the conversion was going alleviate the timing out issues. How does one know what the magic balance of number or records to number of fields is? If I have to go in and add or remove fields every time I want to do a different search it adds tons of time and complexity to basic management. Often I need to see a lot of fields at once for a lot of records. Bottom line is that the minimum requirement for a collection database is that the manager can fully manage their collection. If I want to bring up my entire collection with every field I should be able to do that. What needs to happen to get to that point.

— Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/5530#issuecomment-1403910451, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACFNUM5KAFAS6ASW5SI7OTTWUFJ7PANCNFSM6AAAAAAUFRNH6I . You are receiving this because you are subscribed to this thread.Message ID: @.***>

--

+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects, Professor of Entomology University of Alaska Museum (UAM), University of Alaska Fairbanks 1962 Yukon Drive, Fairbanks, AK 99775-6960 @.*** phone: 907-474-6278 he/him/his University of Alaska Museum https://www.uaf.edu/museum/collections/ento/

Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us

jldunnum commented 1 year ago

Thanks Derek!

dustymc commented 1 year ago

If I want to bring up my entire collection with every field I should be able to do that. What needs to happen to get to that point.

First we'd need to define that - Arctos doesn't really have "fields" but we can pretend that near-infinity things are fields.

Then we'd need to find a suitable way of presenting that, because nothing can treat Arctos like a spreadsheet.

Then we'd need to have a talk with TACC concerning resources.

Yes grabbing FLAT from time to time is probably worthwhile, I can help with that but that UI has been shored up to hopefully accommodate that without choking Arctos if you like the self-serve option.

I can also export "your" data as tables - that's not so trivial and will result in a bunch of files (to avoid the 'maybe infinity' thing) but it's a better backup. (@jebrad have you tried to use this for anything?)

jldunnum commented 1 year ago

Yeah certainly having periodic backup flat files of everything is what we will do. If a talk with TACC about resources is needed then that should be priority one for the Steering committee as far as I'm concerned. I just feel we need to do what is necessary so that we don't have to compromise functionality and ease of use at any level. All the amazing power of Arctos is really hampered if we have to compromise and do work arounds for the basics.

jebrad commented 1 year ago

No I haven't tried to use those tables for anything, other than saying I have a full backup (with some effort required to get functional) in case something wipes out TACC.

dustymc commented 1 year ago

priority one for the Steering committee

I don't disagree, but I think that probably also needs tempered by the AWG. We have for example nearly 600 kinds of identifiers; each of them can be a column in a table (but even that involves smooshing and is not a good format for many tasks). Adding all of those to 100K records would result in sixty million function calls (or some different something, but it's hard to see past the FLAT cache from here); I'm not sure what sort of DB resources would be required to support that in real time, but it's something different than what we have. Around half of those identifiers don't DO anything very obvious, many are only used a very few times, and I've been arguing (unsuccessfully) that they shouldn't exist since the ABQ Arctos meeting. I could be wrong about that, there are arguably valid reasons to have even more identifiers, but how we structure data does influence what can realistically be done with it and what kind of hardware is required to do those things. I don't think we are good with having those "full-spectrum" conversations.

Attributes are similar. We have ~150 unique kinds, a bunch of them all look about the same to me, a bunch aren't much used. We're probably going to end up with a lot more (pathology), and that would be a bit more digestible if we could find a way to successfully clean up what we have now. (And I want to say that a request for a bunch more attributes should be accompanied by resources, but I don't actually know what we need - I don't even know where that conversation starts, maybe working backwards from the desired end result??)

https://github.com/ArctosDB/code-table-work/issues/62 is an example - over two years in, I think the mess is BIGGER than when we started.

do work arounds for the basics

I also spend a fair bit of my time trying to figure out how to make things work within limited resources. That's nice because it keeps my code minimal and efficient, but it also results in things like the huge cache table which has some limitations, consumes lots of CPU (but asynchronously), plugs up backups, etc. I really don't know what could be different, but it sounds like a fun conversation to have.

campmlc commented 1 year ago

We need to be able to customize our downloads in the same way we customize search results, and download all our collection data into a flat file, at a minimum. I agree with Jon that limiting downloads to 5 K with no warning there are additional records available is untenable and not serving our users.

dustymc commented 1 year ago

customize our downloads

That's always been available.

download all our collection data into a flat file

That is simply not possible, please see above. Arctos cannot be squished into a giant spreadsheet. Countless conversations have died here; we have to find a way to get past this idea if we're to move forward.

mkoo commented 1 year ago

Jon has solution so closing. Refile as new issue request if needed

bryansmclean commented 1 year ago

Im going to chime in late, and strongly echo the need to further tailor the new search results format. It seems like there is a solution in this thread for CMs to gain access to more records in a single search. But what about outside researchers and educators - do we expect them to write SQLs as well?

The new search form itself is an excellent improvement, but the inability to know if all resulting records have been returned in the results table, or what (quantitatively) is involved in the tradeoff of increased data fields vs. decreased results, seems like a step backwards in terms of functionality.

My example (although Im not sure its needed here) is a targeted search I just performed (all MSB:Mamm records for Mongolia from 2009-2012) for which there are many records (>3500) but for which I also need much host data (measurement attributes, localities, info on ecto- and endo parasite exams). With the desired data fields, I cap out at 250 records.

bryansmclean commented 1 year ago

Some actionable items could include:

campmlc commented 1 year ago

Yes, agree we need both these features.If the number of records displayed or downloaded is less than the total number of records available for any reason, there needs to be a very prominent warning explicitly stating that fact, and providing options to view and download all records available. That could be reducing the number of search parameters, or requesting a SQL search be sent via email. The current approach is absolutely failing our user communities right now.

On Mon, Feb 27, 2023, 7:18 AM bryansmclean @.***> wrote:

  • [EXTERNAL]*

Some actionable items could include:

  • red box around the number of search records when the number of returned records has been truncated
  • implementing an option to download all records (and being able to specify the desired data fields), even if you cant view them in the table

— Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/5530#issuecomment-1446402590, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBHQ3JVZN2TVNDD7JTTWZSZSTANCNFSM6AAAAAAUFRNH6I . You are receiving this because you commented.Message ID: @.***>

dustymc commented 1 year ago

addressed by https://github.com/ArctosDB/arctos/issues/6018 and https://github.com/ArctosDB/arctos/issues/6019 (and see also https://github.com/ArctosDB/arctos/issues/6111)