ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
60 stars 13 forks source link

Please rewrite SQL - ROOM020 contents #8141

Open camwebb opened 2 hours ago

camwebb commented 2 hours ago

@dustymc, please help rewrite this query to be small enough to run. Purpose: to find all contents of ROOM020 and their freezer. Expected results: 156,000 lines. Need for SQL: the CSV 'contents' download for ROOM020 via the Find Container app is too big and will not complete. For @megulbranson. (Cf. #338)

SELECT guid, accession,
  JSONB_PATH_QUERY(part::jsonb, '$.part_name')::text AS name,
  JSONB_PATH_QUERY(part::jsonb, '$.part_barcode')::text AS bc,
  REGEXP_REPLACE(JSONB_PATH_QUERY(part::jsonb, '$.container_path')::text,
    '.*rm 20 \(room\):\[ ([^ ]+) \].*','\1') AS freezer,
  REGEXP_REPLACE(JSONB_PATH_QUERY(part::jsonb, '$.container_path')::text,
    '.*\(([^)]+)\)"$', '\1') AS container
FROM (
  SELECT guid, accession,
    JSONB_PATH_QUERY(partdetail::jsonb,
      '$[*] ? (@.container_path like_regex " ROOM20 ")') AS part
  FROM flat
  WHERE guid_prefix ~ '^UAM'
) AS P
dustymc commented 2 hours ago

156,000 lines.

There are 181723 containers in room20, I'm not sure that query is doing what you want, and I can't quite guess what you want from it.

Here's a small sample of the "normal" container contents pull:

temp_room20_sample.csv

That's starting with the room and walking down the hierarchy, and will include containers that aren't linked to catalog records. Let me know if you want the full dump in that format, or what you do want if it's not that.

small enough to run

My default query for the room created a table of 181723 rows in a ~half-second, the problem is moving the results of that around the VM - I don't have the resources for that, but I can easily move things around outside of the UI.

camwebb commented 1 hour ago

small sample of the "normal" container contents pull

Yes, this is what we would like, but we can only get 20,000 lines via the CSV-only download. Thanks for offering to side-supply the table, but we really need a way for Mallory to get this herself.

The SQL as given is what we need:

guid accession name bc freezer container
UAM:Mamm:134600 2015.052.Mamm "liver" "365239" LN2FRZR1 cryovial
UAM:Mamm:134600 2015.052.Mamm "kidney" "365238" LN2FRZR1 cryovial
UAM:Mamm:134600 2015.052.Mamm "heart" "365237" LN2FRZR1 cryovial

Is there any way to get this info other than via a request to you? Could the SQL be made fast enough? What prevents an easy container lookup is variable number of edges between the freezer and smallest container.

campmlc commented 1 hour ago

I support this request in general as the limit of 20,000 lines for the CSV of container contents also impacts my collection - I can't even get the contents of a single freezer using the CSV download in the main interface. I'd also like the opportunity to get this information myself without having to file a request each tim, and apparently so do other collections. This is limiting functionality. See #8107 and #6680. An async request option?

dustymc commented 1 hour ago

I support this

Awesome, thanks! I think I've scoped the infrastructure needs in https://github.com/ArctosDB/BackBurner/issues/3 sufficiently to get that started with TACC.

Mallory to get this herself.

If that's the only possibility, then this will have to wait for the above (or a functional alternative).

Could the SQL be made fast enough?

Screenshot 2024-09-30 at 14 01 17

???

The SQL as given is what we need:

I can't work from that, I need to know what data you need.