ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
60 stars 13 forks source link

Need help getting stuff from search for loan requests #6760

Closed Jegelewicz closed 11 months ago

Jegelewicz commented 1 year ago

Request

From @jebrad

Is there a way for me to customize my search results, or maybe request a custom "tool" so I can see the Part ID we recently gave to all our tissue samples? (part attribute = part identifier)

I can View / Download parts under the Tools pull-down, and I can get the Part ID that way, but its spread among 3 columns depending on whether someone put it as Attribute 1, 2 or 3.

Is there an easy way for me to get this Part ID in one column? This is the way that our GRC wants to receive the tissue requests, so it's a tool that I will use a lot. I wouldn't mind doing the excel thing for my own purposes, but I am hoping to find a way to give instructions to loan requestors, and to Sharon, how to generate their own download that will work perfectly for GRC.

Context

UWBM:Mamm has just (mostly) completed the task of getting their tissue parts into Arctos (YAY!). The tissues are managed by a separate department (Sharon) that is not in Arctos (bummer...). When researchers request tissues from the mammal division, they have to generate a list of the tissue identifiers (which are part identifiers in Arctos) for the genomic resources division to be able to fill the request.

Potential solutions

  1. Could we add the ability to filter the part view/download to also filter for part attributes? If so, we could narrow down to the part identifier and make this simple (ish - until someone add a second part identifier, but we could deal with that...)
  2. Create some sort of report (that non-operators could access) that would do this.
  3. Download the whole part table view and use Excel to figure stuff out (OK for internal work, not so much for people looking to borrow stuff)

Any other ideas?

dustymc commented 1 year ago

depending on whether someone put it as Attribute 1, 2 or 3.

Nope...

(2) is pretty simple if we can ignore the parenthetical bits (reports aren't public and I don't think there's any safe way to make them public).

Filtering on part attributes would not be terribly difficult, but that's not the direction I'd go (it's flattened and weird and only so much can be done about that...)

If I were going to do something awesome with (3) I'd probably use "JSON Parts" (or request a normalized parts + attributes zip-or-something), and maybe figure out openrefine (but I'm allergic to Excel...).

people looking to borrow stuff

Especially in light of today's AWG discussions: Does that actually happen? I flipflop around between hearing that collections get actionable, accurate, and itemized lists (and we should probably have a shopping cart to save ya'll some clicks), and that picking specific parts is a curatorial thing that the public can't be expected to tackle. Understanding this a bit more would be very useful for lots of reasons - help?

Jegelewicz commented 1 year ago

I don't think there's any safe way to make them public).

A button that runs a report from search results?

I think everyone would love the shopping cart - but I don't think that we have the appropriate bandwidth to plan and create that?

Nobody that I work with uses JSON and thinking that a majority of people searching Arctos will use it is just wishful thinking (much less the daily Arctos users!).

I can make something that does what Jeff wants, but at some point it will break because something will change and nobody will know to "fix" the external tool (or nobody will know how because I am gone). The API is supposed to be our answer to everything - can we not use that to help solve things like this? If I knew the way to take a set of cat nums from search results, then use the API to make a report, I would do that - but of course, that has the same issues as the Excel solution with sustainability.

This is where opposition to putting anything in any attribute comes from - if we can't ever get at that stuff, what good is it?

dustymc commented 1 year ago

A button that runs a report from search results?

I can't expose that to public users - or I can't allow users to access their own unfiltered data in reports, which can't work.

love the shopping cart

Well they didn't the last time we had one...

The API is supposed to be our answer to everything - can we not use that to help solve things like this?

Absolutely, most anything that can talk to our API would have no trouble at all with something like this - that's (part of) why "JSON Parts" is cached.

can't ever get at that stuff

Maybe I'm not understanding something - please clarify (or set up a zoom or whatever) if it sounds like I'm missing the point.

jebrad commented 1 year ago

Understanding this a bit more would be very useful for lots of reasons - help?

Need is because our tissues in our ultracold freezers are stored alphabetically by the Tissue Number (which we recently standardized to be the Part ID, an attribute of the Part), so it would be nice to generate a report that includes the Tissue Part ID in a single column, and then we send that to our GRC and say "please fill this loan". Currently we use the Parts report to do that, but that gives the Part ID spread among 3 columns so it takes a formula in Excel to get it into one column.

This is just a small need, and we can do it in Excel easily, so if it's a lot of effort or discussion to do it now, perhaps I should just continue with the Excel workaround? You folks certainly have more important problems to solve! (but I do appreciate your help with mine)

PS I clicked the JSON button, out of curiosity, and was confused enough that I didn't even look up with JSON means.

dustymc commented 1 year ago

we send that to our GRC

So easy report (maybe one that produces CSV), no need to be public?

jebrad commented 1 year ago

Yes, having it public was an ideal goal because I didn't realize the dangers/difficulties of that. But a report that I can use, and my operators can use, that produces a csv - that would be a great step forward and a good solution for now.

(I also thought it would be easy to do another filter box where we can filter by the attribute type instead of just tissue type, but if that is not easy then don't try)

Jegelewicz commented 1 year ago

another filter box where we can filter by the attribute type instead of just part name

is probably going to be requested more than once....

dustymc commented 1 year ago

another filter box

I'll have to look at the code but definitely possible and probably easy - file an Issue.

csv

Let me know what you want to see in it.

jebrad commented 1 year ago

csv

Let me know what you want to see in it.

Sorry for the delay, I was checking with our GRC to see what we might need in this report. We have decided on these fields being the important ones we would like to be able to generate for this tissue report:

Genus species, UWBM# aka GUID country, state/province, sex, parts, for the part that is Tissue: Condition Disposition Remarks all attributes: Part identifier (most important) Lot Count Preservation Storage Temp "Accession"

If this starts to look difficult, let me know and instead I'll file an issue to do the other solution we discussed above (another filter box). Thank you!

dustymc commented 1 year ago

Give uwbm_mamm_part_download a whirl.

issue to do the other solution

Teresa did (linked above).

jebrad commented 1 year ago

Give uwbm_mamm_part_download a whirl.

Love it!

jebrad commented 1 year ago

Whoops. I see that several the records were duplicated 2, 4 or 8 times in the report. Happened with other records in a second report.

uwbm_part.csv

dustymc commented 1 year ago

e records were duplicated

I'll look closer in a bit, but I think eg https://arctos.database.museum/guid/UWBM:Mamm:81081 has two part IDs (and two temperatures) on one part. If that had 2 parts with that setup it would have 4 rows as I'm handling things now, one for each part-at-identifier. What should I be doing with that?

jebrad commented 1 year ago

Ugh. OK I will look into a bit more and see if I can find a pattern. Probably related to what you said. I'll let you know what I find, before you should do anything.

Jegelewicz commented 1 year ago

@dustymc it looks like you somehow added part attributes to the records?

From https://arctos.database.museum/guid/UWBM:Mamm:81694

Now has some duplicated attributes

image

that look like they were added by DBA?

image

dustymc commented 1 year ago

That's (probably) the component loader (because I still don't have a way to address https://github.com/ArctosDB/arctos/issues/3861). Looks like someone fed it duplicates, or fed it data and then fed it the same data when that disappeared (as it loaded).

Jegelewicz commented 1 year ago

Ah Haaaaa - so it looks like stuff got duplicated - any easy way for us to find them all?

Jegelewicz commented 1 year ago

Maybe some SQL to find parts that have a certain attribute with duplicate value?

dustymc commented 1 year ago
select
    guid,
    part_name,
    a.attribute_type,
    a.attribute_value
from
    flat
    inner join specimen_part on flat.collection_object_id=specimen_part.derived_from_cat_item
    inner join specimen_part_attribute a on specimen_part.collection_object_id=a.collection_object_id
    inner join specimen_part_attribute b on specimen_part.collection_object_id=b.collection_object_id
where
    guid_prefix='UWBM:Mamm' and
    a.attribute_type=b.attribute_type and
    a.attribute_value=b.attribute_value and
    a.part_attribute_id!=b.part_attribute_id
Jegelewicz commented 1 year ago

@dustymc that's OK - and there are about 4000 rows (so about 2000 of these need to be removed). That's too much to do by hand.

Suggestions for getting rid of one of each pair?

temp_cache.user_sql_tbl_jegelewicz(22).csv

dustymc commented 1 year ago

Please carefully check the attached.

I don't have a great way to provide more access to those IDs (inspect the code behind edit parts/manage attributes??) but I can SQL whatever might help you determine if I'm lost or not.

temp_uwbm_dup_attrs.csv.zip

Jegelewicz commented 1 year ago

OK - I think that will do it. Assuming that everything in the file is considered a "duplicate" and will be removed.

dustymc commented 1 year ago

done

great-blue-heron commented 1 year ago

Hmm. I've been very meticulous about avoiding this exact mistake, but I did bulkload 1,880 part attributes into EEM records on the afternoon of 9/28, so it's sure looking like I loaded the same data twice. I have a saved copy of it, if that's of any help. EEM_LEPA.csv

Are there any other preparators that have this issue, or is it all EEM?

Jegelewicz commented 1 year ago

@great-blue-heron I think this was a case of two people loading stuff, not you loading stuff twice. In any case, we just found all of them, regardless of preparator and removed the duplicate part attributes. Also, Dusty has now proposed a trigger to prevent it from happening in general (because it probably would!). I don't think it was you - it was you combined with me and Jeff.

Jegelewicz commented 11 months ago

I think we are done here? Closing