part vocabulary (or model?)

ArctosDB / arctos

Arctos is a museum collections management system

https://arctos.database.museum

60 stars 13 forks source link

part vocabulary (or model?) #1020

Closed dustymc closed 4 years ago

dustymc commented 7 years ago

We recently added "shell (fossil)" as an Inv part, and now usage is expanding. I believe we now have multiple ways of saying "fossilized shell" - by being explicit, or, some of the time in some collections, by where the part is cataloged.

We also have no indication of what we mean by "fossil" (and there are many accepted-by-someone-for-some-purpose definitions).

I think the situation is actively preventing discovery, not facilitating it.

I don't think adding a parenthetical "fossil" to some parts is an acceptable solution; I don't think we could possibly agree on a workable definition of "fossil," and I don't see how we could add that determination to existing specimens if we could.

This is a time-sensitive issue; if we wish to recover to less ambiguous ground, we need to move soon while the data can (hopefully) still be separated.

(I don't really have any great ideas. Requiring definitions for parts might help me grasp the situation. Perhaps some new part attribute could be used to assert fossilness; at least those are easy for users to avoid!)

campmlc commented 7 years ago

In my opinion, this is part of the greater problem of concatenating part with preservation method. No matter what we decide regarding the definition of "fossilness", every possible part that can be fossilized will now have to have an additional value of "fossil", e.g. shell (dry), shell (ethanol), shell (fossil) etc. This is already out of control for mammals and parasites, and it appears to be heading that way for earth sciences, especially if we start categorizing the different types of fossils (e.g. trace, cast, etc.)

On Wed, Jan 4, 2017 at 9:50 AM, dustymc notifications@github.com wrote:

We recently added "shell (fossil)" as an Inv part, and now usage is expanding. I believe we now have multiple ways of saying "fossilized shell"

by being explicit, or, some of the time in some collections, by where the part is cataloged.

We also have no indication of what we mean by "fossil" (and there are many accepted-by-someone-for-some-purpose definitions).

I think the situation is actively preventing discovery, not facilitating it.

I don't think adding a parenthetical "fossil" to some parts is an acceptable solution; I don't think we could possibly agree on a workable definition of "fossil," and I don't see how we could add that determination to existing specimens if we could.

This is a time-sensitive issue; if we wish to recover to less ambiguous ground, we need to move soon while the data can (hopefully) still be separated.

(I don't really have any great ideas. Requiring definitions for parts might help me grasp the situation. Perhaps some new part attribute could be used to assert fossilness; at least those are easy for users to avoid!)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1020, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hHf27ByL6KV03shfyAc07oTgOsaYks5rO83ogaJpZM4La07J .

dustymc commented 7 years ago

different types of fossils

If we use the Wikipedia definition ("preserved remains or traces of animals, plants, and other organisms from the remote past") then eg http://arctos.database.museum/guid/UAM:Mamm:53942 would use something like "muscle (frozen) (fossil)"....

A few other random issues that should be considered if we're doing anything serious:

Some "preservation method" information is (or would be if we all had the resources to fully embrace the container model) duplicated from container environment. That is, ideally I'd be able to get "frozen" because the part is in a tube which is in a ..... freezer which has a temperature history (from which I could ideally tell HOW frozen - eg, maybe it's in LN2 now but went through a freezer failure after 20 years at 0F).

At some point we'd defined "bare" parts to be "the normal thing" but I don't think that really works. Eg, we have....

UAM@ARCTOS> select part_name || ' @ ' || count(*) from specimen_part where part_name like '%muscle%' group by part_name order by part_name;

PART_NAME||'@'||COUNT(*)
------------------------------------------------------------------------------------------------------------------------
muscle @ 1260
muscle (95% ethanol) @ 2658
muscle (DMSO) @ 47
muscle (RNAlater) @ 7072
muscle (dry) @ 1354
muscle (ethanol) @ 80
muscle (ethanol-fixed) @ 76
muscle (frozen) @ 39549

The existence of "muscle (frozen)" (what I'd assume to be "normal") makes me wonder what plain ol' "muscle" is.

That query actually returns....

UAM@ARCTOS> select part_name || ' @ ' || count(*) from specimen_part where part_name like '%muscle%' group by part_name order by part_name;

PART_NAME||'@'||COUNT(*)
------------------------------------------------------------------------------------------------------------------------
blood serum, muscle (frozen) @ 8
heart, kidney, liver, muscle (frozen) @ 23
heart, liver, muscle (alcohol) @ 1
heart, liver, muscle (frozen) @ 10748
heart, muscle (frozen) @ 571
kidney, muscle (frozen) @ 2
liver, muscle (frozen) @ 577
muscle @ 1260
muscle (95% ethanol) @ 2658
muscle (DMSO) @ 47
muscle (RNAlater) @ 7072
muscle (dry) @ 1354
muscle (ethanol) @ 80
muscle (ethanol-fixed) @ 76
muscle (frozen) @ 39549
muscle, eye (frozen) @ 36
muscle, spleen @ 1
muscle, spleen (frozen) @ 15

.... a bunch of "compound parts" - ideally we'd have "muscle" and "eye" which just happen to be in the same container rather than existing as a mixed part, but again that would rely on a more complete usage of containers.

And just to make sure it stays on the radar, if I search for "rib" I should also get things which contain ribs - "whole organism" and "skeleton" and ....

dustymc commented 7 years ago

see also #991

Jegelewicz commented 7 years ago

I also do not like the combination of part with preservation. I am one without funds or capability to make use of containers, so I probably don't see the whole picture. That being said, from my perspective it would be nice to have a field for each part to tell potential users how it is preserved and just leave the part name to describe the part.

With regard to "fossil": In my mollusk collection I have both recent and "fossil" shells. From the standpoint of someone doing research, they might want to know the relative age of the specimens and the collection date wouldn't do that for the "fossil" specimens. In that case, perhaps instead of "wild caught" we could use a different collecting source to demonstrate their "fossilness"? As I've cataloged my paleo collection it has always seemed strange to call the specimens "wild caught" when "extracted from matrix" would be more appropriate. Of course this is still ambiguous as a freshly dead shell picked up off the beach could be given either collecting source. The other option is to add Geology data to any fossil specimens, thus telling users they are "fossil".

Don't know if that was helpful or not, but it's what I have right now... :-)

dustymc commented 7 years ago

Yes, very helpful, thanks. The more we all know about what everybody else is thinking, the more likely finding a good solution seems.

I keep leaning towards part attributes for both use cases.

Preservation method (storage environment, etc.) changes, and part attributes can handle that:

part=somepart -- attribute=presmeth date=date1 value=whatever1 -- attribute=presmeth date=date2 value=whatever2 .... as many times as you need.

Basically I'm agreeing with you - confounding what a thing IS with what we've done to it can't be the "correct" approach from a data modeling standpoint. Part attributes are modeled as metadata of parts, and I think that lines up perfectly with those sorts of data.

(Parenthetical BUT: We used to have a lot more structure, and that caused a couple orders of magnitude more distinct values - much less discoverability - than we have now primarily because the distinction between things like preservation method and condition is very hard to define, so they get used interchangeably. And it still couldn't deal with eg, changing preservation methods. If we do add structure, it should be very targeted and unambiguous - nobody should have to guess which field might be most appropriate for some data.)

I don't like using metadata of the conceptual stuff (cataloged items - the things that get a specimen event type/collecting method, defined as "whatever some Curator felt like slapping a catalog number on") to attempt inferences regarding physical bits (parts). I can imagine lots of ways that trying to assert "fossilness" in collecting method or via geology would get complicated - I'm not sure how often float gets a geology determination, frozen critters that don't seem very fossil-ey to me sometimes do get that, stuff gets "collected" on ebay, etc., etc. (And FWIW collecting method is now NULLable - the ethnologists had strong opinions about wild-caught motorcycles and such - and we're always up for new/better vocabulary ideas, there or anywhere else.)

"Extracted from matrix" or "it's a fossil because I say so" or etc. (on date by person, optionally) fits in part attributes, and I think that's a much more direct assertion which would lead to much more predictable/discoverable/understandable data.

All that said, part attributes have usability issues - they're 2 big steps away from specimens, so eg, adding them to the bulkloader (6 extra fields number of parts [currently 127 columns] * number of needed part attributes) is not going to be much fun to deal with, and flattening them out for things like specimenresults (eg, 10 parts each with 10 part attributes [each with 6 "columns"]) could get messy and unreadable very quickly.

And that's not so different than the situation which lead us to denormalize container data into parts - even if you do have everything containerized and barcoded, it's a long and expensive trip from specimens to parts then up a container tree until you find something frozen and .... - "part like %frozen%" is just MUCH easier to interact with. I'd rather have weird and redundant data than a perfect model that nobody can use!

campmlc commented 7 years ago

I am concerned about using part attributes for the following reason:

"All that said, part attributes have usability issues - they're 2 big steps away from specimens, so eg, adding them to the bulkloader (6 extra fields number of parts [currently 127 columns] * number of needed part attributes) is not going to be much fun to deal with, and flattening them out for things like specimenresults (eg, 10 parts each with 10 part attributes [each with 6 "columns"]) could get messy and unreadable very quickly."

The bulkloader is already almost too difficult to use when there are many different parts, and adding attributes results in more difficult data entry as well as less discoverability. Who thinks to search on part attributes? I would prefer to search on "fossil" in the part name.

Honestly, I don't think "fossil" is that much different from "ethanol" or "frozen". What concentration of ethanol? Fixed or stored? How many changes of solution? By whom? When? Frozen how? At what temp? For how long? How quickly after demise? Freeze/thaw history? And yet we have "ethanol" and "frozen" as part of the part name. It seems so much clearer and easier for everyone if we do the same for "fossil", until such time as we de-concatenate part name and preservation.

On Thu, Jan 5, 2017 at 5:14 PM, dustymc notifications@github.com wrote:

Yes, very helpful, thanks. The more we all know about what everybody else is thinking, the more likely finding a good solution seems.

I keep leaning towards part attributes for both use cases.

Preservation method (storage environment, etc.) changes, and part attributes can handle that:

part=somepart -- attribute=presmeth date=date1 value=whatever1 -- attribute=presmeth date=date2 value=whatever2 .... as many times as you need.

Basically I'm agreeing with you - confounding what a thing IS with what we've done to it can't be the "correct" approach from a data modeling standpoint. Part attributes are modeled as metadata of parts, and I think that lines up perfectly with those sorts of data.

(Parenthetical BUT: We used to have a lot more structure, and that caused a couple orders of magnitude more distinct values - much less discoverability - than we have now primarily because the distinction between things like preservation method and condition is very hard to define, so they get used interchangeably. And it still couldn't deal with eg, changing preservation methods. If we do add structure, it should be very targeted and unambiguous - nobody should have to guess which field might be most appropriate for some data.)

I don't like using metadata of the conceptual stuff (cataloged items - the things that get a specimen event type/collecting method, defined as "whatever some Curator felt like slapping a catalog number on") to attempt inferences regarding physical bits (parts). I can imagine lots of ways that trying to assert "fossilness" in collecting method or via geology would get complicated - I'm not sure how often float gets a geology determination, frozen critters that don't seem very fossil-ey to me sometimes do get that, stuff gets "collected" on ebay, etc., etc. (And FWIW collecting method is now NULLable - the ethnologists had strong opinions about wild-caught motorcycles and such - and we're always up for new/better vocabulary ideas, there or anywhere else.)

"Extracted from matrix" or "it's a fossil because I say so" or etc. (on date by person, optionally) fits in part attributes, and I think that's a much more direct assertion which would lead to much more predictable/discoverable/understandable data.

All that said, part attributes have usability issues - they're 2 big steps away from specimens, so eg, adding them to the bulkloader (6 extra fields number of parts [currently 127 columns] * number of needed part attributes) is not going to be much fun to deal with, and flattening them out for things like specimenresults (eg, 10 parts each with 10 part attributes [each with 6 "columns"]) could get messy and unreadable very quickly.

And that's not so different than the situation which lead us to denormalize container data into parts - even if you do have everything containerized and barcoded, it's a long and expensive trip from specimens to parts then up a container tree until you find something frozen and ....

"part like %frozen%" is just MUCH easier to interact with. I'd rather have weird and redundant data than a perfect model that nobody can use!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1020#issuecomment-270795966, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hL5EosbnqefHYwjLHyW4mp6T2eFnks5rPYdmgaJpZM4La07J .

dustymc commented 7 years ago

I don't think "fossil" is that much different from "ethanol" or "frozen".

Maybe....

What concentration of ethanol? Fixed or stored? How many changes of solution? By whom? When? Frozen how? At what temp? For how long? How quickly after demise? Freeze/thaw history?

That all fits in container environment, and so the 'frozen' (pickled, whatever) in part name can be seen as an indication that there is/might be more data elsewhere - it's a convenience. You don't have to use the "more data" bits, but it is available and so it's not necessary to embed that information into part strings. To treat fossils the same way, I'd like to see something roughly equivalent to container environment - some way of fully expressing WHY someone thought it was a fossil. Maybe part attributes is sufficient.

(The details of frozen-ness could be handled in part attributes as well, but not elegantly. Containers let you update 50K parts by recording the temperature of a freezer, for example.)

every possible part ... additional value

Yup.

UAM@ARCTOS> select count(distinct(part_name)) from ctspecimen_part_name;

COUNT(DISTINCT(PART_NAME))
--------------------------
               939

One untested model revision is a dictionary: have a list of terms

left
right
shell
frozen

and let people put them together however they want as part names. That MIGHT let us be more precise (how many of us add to the code tables when we find a formalin-fixed left eyeball?) with fewer terms ("heart" appears once, rather than in 46 - really! - combinations), wouldn't allow scull (creative spelling) as a part, but it would set us up to find "frozen left right" parts cataloged. It lacks predictability - there's no finite set of part terms associated with that model. A thousand part strings probably don't look terribly finite to most users either....

campmlc commented 7 years ago

I think separating the parts into strings may be worth trying. Could we have the ability to enter multiple part terms in the first box, e.g., type "heart" and then "kidney" , "lung", "spleen", and have each of these checked against controlled vocab and accepted, like an agent name? It would be easier to do that than scroll and select each one in a dropdown. Or, we could keep our current "heart, kidney, lung, spleen" part names, although this is still problematic for reduction of terms. Then we could choose "frozen" or "95% ethanol" in a separate dropdown, and "formalin-fixed" in a third?

At the very least, splitting part name and part preservation into two fields should reduce that 939 number of parts substantially. If we don't try something different, that number is only going to grow. And we are going to keep running into the problem of people needing access to the code table.

On Mon, Mar 27, 2017 at 6:52 PM, dustymc notifications@github.com wrote:

I don't think "fossil" is that much different from "ethanol" or "frozen".

Maybe....

What concentration of ethanol? Fixed or stored? How many changes of solution? By whom? When? Frozen how? At what temp? For how long? How quickly after demise? Freeze/thaw history?

That all fits in container environment, and so the 'frozen' (pickled, whatever) in part name can be seen as an indication that there is/might be more data elsewhere - it's a convenience. You don't have to use the "more data" bits, but it is available and so it's not necessary to embed that information into part strings. To treat fossils the same way, I'd like to see something roughly equivalent to container environment - some way of fully expressing WHY someone thought it was a fossil. Maybe part attributes is sufficient.

(The details of frozen-ness could be handled in part attributes as well, but not elegantly. Containers let you update 50K parts by recording the temperature of a freezer, for example.)

every possible part ... additional value

Yup.

UAM@ARCTOS> select count(distinct(part_name)) from ctspecimen_part_name;

COUNT(DISTINCT(PART_NAME))
         939
One untested model revision is a dictionary: have a list of terms

left

right

shell

frozen

and let people put them together however they want as part names. That MIGHT let us be more precise (how many of us add to the code tables when we find a formalin-fixed left eyeball?) with fewer terms ("heart" appears once, rather than in 46 - really! - combinations), wouldn't allow scull (creative spelling) as a part, but it would set us up to find "frozen left right" parts cataloged. It lacks predictability - there's no finite set of part terms associated with that model. A thousand part strings probably don't look terribly finite to most users either....

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1020#issuecomment-289629587, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hD_0XoK1hQDo_FgZxeBnd3ILMEZ_ks5rqFnFgaJpZM4La07J .

dustymc commented 7 years ago

I will strongly resist any efforts to resuscitate preservation method. I'm certainly open to different models, but chucking the thing that made giant messes (~10K unique "part strings" with presmeth-->~100 without) right back in there in an effort to fix a relatively small mess doesn't really make much sense to me. It's also wholly incapable of doing it's one job, as you pointed out above - what is the presmeth for a formalin-->ethanol-->oh crud-->better ethanol-->freezer-->oops-->colder freezer pathway? (That all fits nicely in container environment, much of it could be automated, and it's normalized so updates affect many parts.)

That aside, I'm thinking one dictionary, as many parts as you want.

heart, kidney, lung, spleen would be a viable part. (So would/is heart + kidney + lung + spleen as four parts in one container - you can fix that mess now if you want to.)

I suppose we'd have to make "95%" a term, so "95% ethanol" could be constructed. (I'd rather just say "ethanol" and use container environment.)

"formalin-fixed, ethanol-preserved, currently-frozen, heart, liver, eyeball, lung, spleen" could be constructed, if someone insists.

"95% frozen" would also be a valid (and perhaps occasionally accurate...) part name; I don't see how to add "grammar" controls to this, it is a less-structured model which will demand a bit more care from operators (and that may be a fatal flaw).

I'm not sure how the UI would work - that would take some experimentation, there are lots of things that might be technically feasible, hopefully some of them are also usable.

I grabbed unique "part terms" (space-split current data).

create table temp_pt (t VARCHAR2(255));

declare
  l_str    varchar2(4000);
  v_tab parse_list.varchar2_table;
   v_nfields integer;
begin
  for r in (select distinct part_name from ctspecimen_part_name) loop
    parse_list.delimstring_to_table (r.part_name, v_tab, v_nfields,' ');
    for i in 1..v_nfields loop
      insert into temp_pt(t) values(v_tab(i));
    end loop;
  end loop;
end;
/
create table temp_ptu as select distinct(t) from temp_pt;

temp_ptu.csv.zip

There are 479 terms, including at least a few dozen very obvious duplicates (photo, photograph; section(s), sectioned, sections - should we clean some stuff up now?). 50% fewer choices! (And infinitely more combinations...)

campmlc commented 7 years ago

Yes, let's clean up obvious issues now, and play with the model. I'd like to try using container environment for 95% etc and see how that works in test. We'd need to alter data entry form?

On Mar 28, 2017 9:44 AM, "dustymc" notifications@github.com wrote:

{snip}

dustymc commented 7 years ago

We'd need to alter data entry form

No, that's one of the powerful things about the container environment model. Just record the "environment" (ethanol concentration in a jar, freezer temp, room humidity, whatever) of a container and that data become available to the specimens in that container. If you have a probe which can talk to the Internet (eg, freezer temp log) I could set up an API for it to talk to. (And that could lead to things like "your freezer is melting" alerts.)

clean up

The part code table is http://arctos.database.museum/info/ctDocumentation.cfm?table=CTSPECIMEN_PART_NAME - I'm happy to SQL-merge stuff or whatever.

campmlc commented 7 years ago

This could create issues for us, as we don't assign a container type or preservation to a container until we actually use it in the field (barcode can be put on different container types, e.g vial or cryovial, with different concentrations of ethanol, e.g. 70%, 95%, or frozen, etc.) Later the container type, but not the preservation method or container environment, is entered using Move Container or Batch Scan.

Then when data entry happens, info on the part and preservative/frozen/etc are recorded. Students are not using Edit Container etc during data entry. Ideas??

On Mar 28, 2017 10:17 AM, "dustymc" notifications@github.com wrote:

We'd need to alter data entry form

No, that's one of the powerful things about the container environment model. Just record the "environment" (ethanol concentration in a jar, freezer temp, room humidity, whatever) of a container and that data become available to the specimens in that container. If you have a probe which can talk to the Internet (eg, freezer temp log) I could set up an API for it to talk to. (And that could lead to things like "your freezer is melting" alerts.)

clean up

The part code table is http://arctos.database.museum/ info/ctDocumentation.cfm?table=CTSPECIMEN_PART_NAME - I'm happy to SQL-merge stuff or whatever.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1020#issuecomment-289823045, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hJvYGRkcYAEGWOHT4ScfglUS0Qbgks5rqTJxgaJpZM4La07J .

dustymc commented 7 years ago

@campmlc none of that need be explicitly recorded in relation to any specific part. When the part is scanned into a container with an environment (not necessarily directly - "part-->tube-->position-->.....-->freezer [{temp}@{time} recorded by {agent}]" works), that environmental data is accessible from the part. (Forms will surely need developed if this gets used, but you can get there now by clicking part history and browsing up the container tree.)

Arctos maintains container history, so when you pull the tube out of the freezer and scan it into ethanol you still don't need to do anything extra - you can see that the tube was {there} which has {environmental history} and on {date} was moved {here} which has {more environment} etc. by following around existing linkages (which again could be summarized however we want).

Nothing (except $ perhaps!) prevents you from starting that process by slapping a battery-powered temp logger on the LN2 dewar you take to the field.

Arctos just pulls together the data about parts (from data entry), location (from scanning stuff), and environment - you're already using containers, the only thing you need to do to use this is to record container metadata in Arctos, and you probably already have that information - "none of the jars on {shelf} looked particularly funky on {date}" is useful information, even if it's not quite as precise as we'd all like.

dustymc commented 7 years ago

Doesn't look like we're going to find a quick solution, de-escalating priority a bit

campmlc commented 7 years ago

discussed denormalizing parts to have fixation and preservation as part attributes, which can be added iteratively as parts are transfered to different environments.

dustymc commented 7 years ago

https://github.com/ArctosDB/arctos/issues/1119#issuecomment-298777350

MSB prefers (1), and the obvious place to "do something weird" is in part attributes. "Sorta stinky, but frozen again" would be recorded as multiple Attributes:

part_name=muscle

part_attribute "preservation method (or whatever)"=frozen (optionally by PERSON on DATE etc.)
part_attribute "preservation method (or whatever)"=thawed (optionally by PERSON on DATE etc.)
part_attribute "preservation method (or whatever)"=stinky (optionally by PERSON on DATE etc.)
part_attribute "preservation method (or whatever)"=frozen (optionally by PERSON on DATE etc.)

A "combined history display value" could be auto-generated - eg, the above example could display as "part_name=muscle (frozen, thawed, stinky, frozen)." (Details or uncombined data would be available from the partdetail specimen results column, edit forms, and probably the parts grid on specimendetail.)

No model changes are necessary. New part attributes are likely necessary (code table addition), and we may want to control vocabulary for some attributes (would require app development).

This approach would probably also require a bulkloader/data entry update to include part attributes, and possibly a display adjustment.

A complete implementation would involve normalizing part name, so our current 18 parts containing "muscle" might become one ("muscle") and a bunch of Attributes.

Those 18 parts include things like 'heart, muscle (frozen)', which could (now, and it's always been possible) be two parts in the same container. You'd need to update both parts' attributes when something happens to the tube.

campmlc commented 7 years ago

How would this change current data entry screens? One concern would be increasing data entry complexity/time, which maybe could be fixed with interface changes. With multiple parts in the same container: could we enter "heart, kidney, lung, spleen" but have those parts autocreated separately? If they have to be created as individual parts in the same barcode that would add a lot of complexity and also we'd run out of parts and have to use the"add parts" tool. If so, we definitely need to automate the additional parts or attribute bulkloader.

For attributes, could we enter a part name, and then have a pop-up to select the attributes (e.g. frozen, 95% ethanol; or RNAlater, frozen)?

On Fri, Sep 15, 2017 at 4:23 PM, dustymc notifications@github.com wrote:

1119 (comment)

https://github.com/ArctosDB/arctos/issues/1119#issuecomment-298777350

MSB prefers (1), and the obvious place to "do something weird" is in part attributes. "Sorta stinky, but frozen again" would be recorded as multiple Attributes:

part_name=muscle

part_attribute "preservation method (or whatever)"=frozen (optionally by PERSON on DATE etc.)

part_attribute "preservation method (or whatever)"=thawed (optionally by PERSON on DATE etc.)

part_attribute "preservation method (or whatever)"=stinky (optionally by PERSON on DATE etc.)

part_attribute "preservation method (or whatever)"=frozen (optionally by PERSON on DATE etc.)

A "combined history display value" could be auto-generated - eg, the above example could display as "part_name=muscle (frozen, thawed, stinky, frozen)." (Details or uncombined data would be available from the partdetail specimen results column, edit forms, and probably the parts grid on specimendetail.)

No model changes are necessary. New part attributes are likely necessary (code table addition), and we may want to control vocabulary for some attributes (would require app development).

This approach would probably also require a bulkloader/data entry update to include part attributes, and possibly a display adjustment.

A complete implementation would involve normalizing part name, so our current 18 parts containing "muscle" might become one ("muscle") and a bunch of Attributes.

Those 18 parts include things like 'heart, muscle (frozen)', which could (now, and it's always been possible) be two parts in the same container. You'd need to update both parts' attributes when something happens to the tube.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1020#issuecomment-329917799, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hG9-bv9b4lvEW4jQkNNxxtlUIy6Nks5sivjpgaJpZM4La07J .

dustymc commented 7 years ago

change current data entry

This change would require duplicating even more data than we do now - saying the same thing multiple places - and I'd expect that to be apparent in the entry tools. We can certainly make the forms better than they are now, but creating more data is ultimately going to be a more complex process.

parts autocreated separately

Something like that might be possible, but I'd expect it to reduce initial data quality and add work to the approval process, which may or may not be a good trade-off. Error logs suggest we already struggle with one controlled vocabulary, I think you're suggesting multiple instances of multiple vocabularies in the same "field" (how many ways are there to say "formalin-fixed ethanol-preserved heart, kidney, lung, spleen that we keep in the freezer"?).

automate ... bulkloaders

That's technically trivial but has data quality implications - eg, you could "approve" data which you never see by approving the specimen record. This deserves it's own issue.

pop-up

That's how part attributes works now?

dustymc commented 7 years ago

A possible solution to a few of these problems:

add one new field to the specimen bulkloader, JSON_PARTS
Adjust the parts grid on the data entry screens, or perhaps add an alternative form if there's some reason to keep "the old way" as an option
add a JSON parser to the server-side bulkloader to deal with the new "field."

The popup-form (or sub-form or whatever - there's lots of flexibility in presentation) could be infinitely expandable:

so 500 parts each with 500 attributes works, and it would be easy to add more part-stuff (container-stuff, for example).

Most users would not need to know about any of this - the parts grid would just have some new possibilities.

The form-data would be compressed into a string for transport, so negligible effect on the bulkloader.

The JSON would be available in the normal place as part of the (potential) specimen record, so no need to blindly "approve" things you may not bother looking at (eg, parts in the parts bulkloader linked to specimens by local unique IDs).

"View JSON in a form" links could be scattered around wherever they're useful (eg, specimenresults/partdetail).

JSON is a Standard, so converting your locally-produced data (eg, spreadsheet with columns "part_name_17" and "part_name_17_attribute_value_23" (=17 parts, at least one of them having 23 attributes) into standard JSON should be straightforward (and Arctos could provide a service).

The parser (takes the JSON string, creates parts+attributes+whatever) should be relatively straightforward, and isn't anything that users need to be concerned with.

campmlc commented 6 years ago

We need to move this forward for the GGBN grant. Dusty, if this were implemented, what would the interface look like? The JSON string in part detail in specimen results is not very pretty, and not something we can ask students to come up with. How hard to put something together in test for us to look at?

dustymc commented 6 years ago

I think there are two things here.

1) For GGBN we need a way of addressing tissue quality. I think the verbiage in the proposal contains my assumption that we're going to use container environment - things like freezer temperature - to do so. We seem to be heading in a different direction, so ya'll need to develop protocols and vocabulary - I see no model or major interface changes in that approach. I'm happy to help with the vocab however I can, but ya'll know what you have and what you can tolerate at data entry and what your users need and etc.

2) If we're denormalizing, then we'll need to move more stuff around with every part, and smooshing it all into a compact transport protocol like JSON is occasionally a convenient way of doing so.

JSON is beautiful, and anyone who says otherwise should be sentenced to another decade of XML!

what would the interface look like

https://github.com/ArctosDB/arctos/issues/1020#issuecomment-332256323 is one possibility.

People can't type JSON; JSON must be generated.

I don't think ANYTHING in Arctos has a "the interface." These are presumably data you'd want to capture in the field, so one interface might be Excel (at least until we can build an app). If we're committed to this approach I can develop WHATEVER as ya'll need it, but I don't think there's going to be any sort of demo that's more informative than the screenshot above or the current edit parts form (it generates data which could easily be converted to JSON).

JSON is a transport mechanism. Instead of ~80 bulkloader columns for each part (that covers 10 attributes, which would almost certainly quickly become limiting anyway) there'd be one column into which you could stuff however many parts each with however many attributes you want (eg, by clicking "save" on some app that looks like the screenshot mockup). Arctos would just add a JSON unroller wherever those data might land, and once they're unrolled they work like all other normalized data.

You don't have to see JSON for any of that to happen.

JSON is also sometimes a convenient way to display complex data in a simple format, which is all that's going on with specimenresults/partdetail. I'm happy to do something else there, I just need to know what ya'll want to see/how you want to see it.

Maybe we need an all-hands meeting dedicated to "preservation method"? As a replacement for container environment this is a major change in direction, and I'm not sure how effectively that's being communicated.

campmlc commented 6 years ago

If we convert to using part attributes for sample quality and history, do we have to get rid of container environment? Or can the latter still be used and accessed via object tracking?

I think a dedicated meeting would be a good idea. How soon can we schedule? It would be good to get the GGBN report submitted before Christmas. Mariel

On Mon, Dec 11, 2017 at 10:24 AM, dustymc notifications@github.com wrote:

I think there are two things here.

1.

For GGBN we need a way of addressing tissue quality. I think the verbiage in the proposal contains my assumption that we're going to use container environment - things like freezer temperature - to do so. We seem to be heading in a different direction, so ya'll need to develop protocols and vocabulary - I see no model or major interface changes in that approach. I'm happy to help with the vocab however I can, but ya'll know what you have and what you can tolerate at data entry and what your users need and etc. 2.

If we're denormalizing, then we'll need to move more stuff around with every part, and smooshing it all into a compact transport protocol like JSON is occasionally a convenient way of doing so.

JSON is beautiful, and anyone who says otherwise should be sentenced to another decade of XML!

what would the interface look like

1020 (comment)

https://github.com/ArctosDB/arctos/issues/1020#issuecomment-332256323 is one possibility.

People can't type JSON; JSON must be generated.

I don't think ANYTHING in Arctos has a "the interface." These are presumably data you'd want to capture in the field, so one interface might be Excel (at least until we can build an app). If we're committed to this approach I can develop WHATEVER as ya'll need it, but I don't think there's going to be any sort of demo that's more informative than the screenshot above or the current edit parts form (it generates data which could easily be converted to JSON).

JSON is a transport mechanism. Instead of ~80 bulkloader columns for each part (that covers 10 attributes, which would almost certainly quickly become limiting anyway) there'd be one column into which you could stuff however many parts each with however many attributes you want (eg, by clicking "save" on some app that looks like the screenshot mockup). Arctos would just add a JSON unroller wherever those data might land, and once they're unrolled they work like all other normalized data.

You don't have to see JSON for any of that to happen.

JSON is also sometimes a convenient way to display complex data in a simple format, which is all that's going on with specimenresults/partdetail. I'm happy to do something else there, I just need to know what ya'll want to see/how you want to see it.

Maybe we need an all-hands meeting dedicated to "preservation method"? As a replacement for container environment this is a major change in direction, and I'm not sure how effectively that's being communicated.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1020#issuecomment-350794576, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hKctPCEl5r-ccfwcdLIVDrmSiE9Hks5s_WUygaJpZM4La07J .

dustymc commented 6 years ago

I don't think we need to get rid of container environment, but I don't think we currently have the resources to develop both, and I think having two ways to getting at the same thing would be a major usability issue. Eg if attributes doesn't get us where we want to be we'd probably need another proposal to further develop/integrate containers, document that mixed approach, etc.

I can initially provide GGBN with part condition, and it will be easy to add/replace/adjust that as we begin supplementing those data with part attributes. I don't think this will be a replacement, which would require "translating" existing part condition (and remarks and wherever else these types of data have been recorded). The data we have is just what we have for "legacy" tissues (eg, those collected before today), GGBN has provided a framework for going forward.

dustymc commented 4 years ago

merge>https://github.com/ArctosDB/arctos/issues/1460