ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
61 stars 13 forks source link

Linkage of Parts to Collecting Events #1545

Closed campmlc closed 5 years ago

campmlc commented 6 years ago

Per conversation with John Wieczorek, we need to seek funding to implement changes to the event model so that parts = material samples are linked to events = occurrences. This would resolve a deficiency in our current data model and would alleviate the problems Arctos has with serving data to GGBN. Perhaps this can be accomplished with GGBN funding.

dustymc commented 6 years ago

Discussion resolution:

Add a table linking OUT from specimen_events to "event stuff."

Optional - does not require any additional actions.

When available, link a specimen event to a part (for now - ability to add more stuff later should not be a problem).

Need:

Long-term: look at elevating catalog number (as "individual ID") to encompass multiple "occurrences" (current cataloged_item model, more or less).

tucotuco commented 6 years ago

Here is a summary of the basic issues that arose in trying to publish records of samples from Arctos to GGBN.

The GGBN data publishing model uses Darwin Core Archives with a core file consisting of Darwin Core Occurrences (dwc:Occurrence instances), with additional files (extensions) that connect records about preparations, permits, etc. to those core dwc:Occurrences through dwc:occurrenceIDs. Following the rules of Darwin Core Archives, each dwc:occurrenceID in the core file must correspond to exactly one record in that file - they must be unique in that file. Records in extension files must also have dwc:occurrenceIDs that match one record in the core file, though at times there may be many records in an extension that correspond to a core record (a one-to many relationship between the core and the extension resulting in a star schema). The Darwin Core Archive model with an Occurrence Core is the same as used by GBIF to publish dwc:Occurrences for specimens and observations of all kinds (PreservedSpecimens, FossilSpecimens, HumanObservations, MachineObservations, etc.).

The definition of a dwc:Occurrence is "An existence of an Organism (sensu http://rs.tdwg.org/dwc/terms/Organism) at a particular place at a particular time." Another way to say this is "The presence of a dwc:Organism during a dwc:Event". Aside from a problem of semantics, the GGBN model presents no readily apparent problem in the simple case in which one dwc:Organism yields one dwc:MaterialSample (defined as "A physical result of a sampling (or subsampling) event. In biological collections, the material sample is typically collected, and either preserved or destructively processed." Examples: A whole organism preserved in a collection. A part of an organism isolated for some purpose. A soil sample. A marine microbial sample.).

The problem is that GGBN expects to have one MaterialSample per record in its Darwin Core Occurrence archives. The semantic conundrum is that a dwc:MaterialSample can serve as physical evidence of a dwc:Occurrence, but it is not a narrower case of a dwc:Occurrence, because an Occurrence refers to a dwc:Organism at a place and time, not a dwc:MaterialSample at a place and time. The real world isn't always as simple as one material sample per organism per event. Arctos, in particular, can and frequently does capture multiple instances of dwc:MaterialSamples taken from the same dwc:Organism, sometimes in a single dwc:Event (multiple samples taken at once), and sometimes in distinct dwc:Events (samples from the same organism taken at different times). The first case can not be captured in a GGBN Darwin Core Occurrence Core without conflating the meaning of a dwc:Occurrence. Since a dwc:Occurrence is the dwc:Organism during the dwc:Event, more than one material sample taken from that organism would share the same dwc:Occurrence, but the rules of the structure prohibit the duplication of the identifier for the dwc:Occurrence.

The only solutions to this problem are to artificially mint "occurrenceIDs" that represent a dwc:MaterialSample taken during a dwc:Event or to artificially use existing or minted dwc:materialSampleIDs as a proxy for the dwc:occurrenceIDs required in the Occurrence Core. Regardless of which solution is used, the GGBN model has two undesirable consequences. Foremost, it compels collections that support multiple samples per event to publish two distinct Darwin Core archives, one for GGBN with a dwc:MaterialSamples as Occurrences and another one with dwc:Occurrences as Occurrences for GBIF. Second, the meaning of an Occurrence in the GGBN archive can not be guaranteed to be semantically correct for every record.

A change to the GGBN model to overcome these deficiencies is highly recommended. A better model would be to be able to publish a single resource that satisfies both GBIF and GGBN, with all data for both in the same archive. Such a model is readily possible, but requires a trick to get around the restriction of a star schema for Darwin Core Archives.

On the Arctos side, there is two deficiencies that must be overcome to rigorously represent dwc:MaterialSamples in the way GGBN expects. The first is to be able to state unambiguously which dwc:Event a dwc:MaterialSample was generated from. This is currently known for the majority of material samples, but not for all. The second deficiency is that the Arctos material samples do not currently have global unique identifiers. We believe this can be fairly easily overcome by creating URLs as GUIDs that include references to both the cataloged item, the event and the material sample (e.g., http://arctos.database.museum/guid/MSB:Mamm:292063?seid=3175067&sampleID=3).

dustymc commented 6 years ago

ALL of our OccurrenceIDs are artificial - that's just not what museums catalog, we make them up just for DWC, I don't much mind making even more.

The first is to be able to state unambiguously which dwc:Event a dwc:MaterialSample was generated from

In the one case where we know and don't properly record those data, we're working on it.

In the other two cases I don't see this as a defect, but rather critical functionality. We do provide event type and verification status, and the portals who don't want those data are welcome to reject them.

But....

Looking more inward, we do provide event type and verification status, but the data are not very good.

select 
specimen_event.specimen_event_type,
specimen_event.verificationstatus
from
specimen_event,
flat
where
  8  specimen_event.collection_object_id=flat.collection_object_id and guid='MSB:Mamm:292063';

SPECIMEN_EVENT_TYPE
------------------------------------------------------------------------------------------------------------------------
VERIFICATIONSTATUS
------------------------------------------------------------------------------------------------------------------------
collection
unverified

collection
unverified

collection
unverified

collection
unverified

collection
unverified

5 rows selected.

http://arctos.database.museum/info/ctDocumentation.cfm?table=CTSPECIMEN_EVENT_TYPE

collection | Specimen was collected. The specimen was killed, found dead, or removed from functional cultural, biological, ecological, or archeological context.

Those data are indistinguishable from eg http://arctos.database.museum/guid/UAMb:Herb:10002...


select 
specimen_event.specimen_event_type,
specimen_event.verificationstatus
from
specimen_event,
flat
where
  8  specimen_event.collection_object_id=flat.collection_object_id and guid='UAMb:Herb:10002';

SPECIMEN_EVENT_TYPE
------------------------------------------------------------------------------------------------------------------------
VERIFICATIONSTATUS
------------------------------------------------------------------------------------------------------------------------
collection
unverified

collection
unverified

...for which the assertion is "we got some data back from a TCN and it's not very trustworthy; we don't have the resources to accept or reject anything, so here's everything we have."

It is really critical to use http://arctos.database.museum/info/ctDocumentation.cfm?table=CTSPECIMEN_EVENT_TYPE and http://arctos.database.museum/info/ctDocumentation.cfm?table=CTVERIFICATIONSTATUS properly; I don't feel like we can push the idea of filtering to "DWC portals" until we're actually doing what we say we're doing.

dustymc commented 6 years ago

There is some stuff to look at in test. I was able to recover events for 1660 parts under 171 specimens from remarks. I may be able to make more links, but that should be sufficient for testing. The GUIDs are below.

There's a new "Event" column in the parts grid at SpecimenDetail (the only place any of this is visible at the moment).

screen shot 2018-08-16 at 11 56 28 am

when there's no event-link it contains 'pickEvent' which opens a popup from which you can select from events associated with the specimen.

screen shot 2018-08-16 at 11 57 33 am

I could use some feedback regarding what you want to see in that popup (assuming we're on a useful trajectory here) - it's very wide for what it does.

For parts with linked events, there are a couple more options in the new column.

screen shot 2018-08-16 at 11 58 21 am

removeLink does just what it sounds like it should do.

linkedEvent:{specimen_event_id} highlights the event and the part you clicked on (and attempts to scroll some stuff).

There's also a new widget on the specimen event side

screen shot 2018-08-16 at 12 00 14 pm

which highlights the event and all linked parts.

The new table is...

UAM@ARCTOSTE> desc specimen_event_links
 Name                                  Null?    Type
 ----------------------------------------------------------------- -------- --------------------------------------------
 SPECIMEN_EVENT_LINK_ID                        NOT NULL NUMBER
 COLLECTION_OBJECT_ID                          NOT NULL NUMBER
 SPECIMEN_EVENT_ID                         NOT NULL NUMBER
 PART_ID                                    NUMBER

and the code is written in anticipation of adding more links.

There are keys involved, so this is a hard link. You won't be able to delete linked specimen events or items which are linked to specimen events. There are no additional controls - you can change everything except the ID of the event when it's linked to parts.

The UI has very little in the way of error handling and such - let me know if you manage to break it, but I think we're looking for "functional" rather than polished at this point.

I'll need some time to adjust some maintenance scripts before this can go to production, so please review and let me know if this is a viable approach ASAP.


UAM@ARCTOSTE>   select guid from flat where collection_object_id in (select collection_object_id from specimen_event_links) group by guid order by guid;

GUID
------------------------------------------------------------------------------------------------------------------------
MSB:Mamm:135366
MSB:Mamm:142639
MSB:Mamm:142640
MSB:Mamm:142641
MSB:Mamm:142838
MSB:Mamm:142853
MSB:Mamm:142854
MSB:Mamm:157019
MSB:Mamm:157024
MSB:Mamm:157069
MSB:Mamm:157079
MSB:Mamm:157088
MSB:Mamm:157769
MSB:Mamm:157770
MSB:Mamm:193587
MSB:Mamm:193629
MSB:Mamm:193640
MSB:Mamm:193643
MSB:Mamm:193644
MSB:Mamm:193645
MSB:Mamm:193648
MSB:Mamm:193649
MSB:Mamm:193680
MSB:Mamm:193682
MSB:Mamm:193693
MSB:Mamm:193695
MSB:Mamm:193703
MSB:Mamm:193706
MSB:Mamm:193707
MSB:Mamm:193708
MSB:Mamm:195195
MSB:Mamm:195228
MSB:Mamm:195271
MSB:Mamm:195332
MSB:Mamm:195348
MSB:Mamm:195565
MSB:Mamm:195571
MSB:Mamm:195591
MSB:Mamm:196205
MSB:Mamm:196209
MSB:Mamm:196517
MSB:Mamm:196518
MSB:Mamm:196519
MSB:Mamm:196521
MSB:Mamm:196523
MSB:Mamm:196524
MSB:Mamm:196547
MSB:Mamm:196574
MSB:Mamm:196962
MSB:Mamm:196963
MSB:Mamm:199861
MSB:Mamm:212071
MSB:Mamm:214500
MSB:Mamm:216148
MSB:Mamm:222076
MSB:Mamm:222960
MSB:Mamm:222974
MSB:Mamm:222980
MSB:Mamm:223038
MSB:Mamm:224370
MSB:Mamm:224371
MSB:Mamm:224537
MSB:Mamm:224544
MSB:Mamm:224545
MSB:Mamm:224595
MSB:Mamm:224769
MSB:Mamm:224885
MSB:Mamm:224891
MSB:Mamm:224941
MSB:Mamm:224943
MSB:Mamm:224965
MSB:Mamm:224982
MSB:Mamm:225053
MSB:Mamm:225054
MSB:Mamm:225056
MSB:Mamm:225081
MSB:Mamm:226038
MSB:Mamm:230725
MSB:Mamm:231360
MSB:Mamm:231361
MSB:Mamm:231362
MSB:Mamm:231382
MSB:Mamm:231385
MSB:Mamm:231404
MSB:Mamm:231697
MSB:Mamm:231700
MSB:Mamm:231706
MSB:Mamm:231708
MSB:Mamm:231709
MSB:Mamm:231715
MSB:Mamm:231719
MSB:Mamm:231720
MSB:Mamm:231722
MSB:Mamm:231727
MSB:Mamm:231730
MSB:Mamm:231732
MSB:Mamm:231734
MSB:Mamm:233252
MSB:Mamm:233264
MSB:Mamm:233270
MSB:Mamm:233272
MSB:Mamm:233279
MSB:Mamm:233283
MSB:Mamm:233286
MSB:Mamm:233347
MSB:Mamm:240934
MSB:Mamm:248110
MSB:Mamm:248111
MSB:Mamm:248113
MSB:Mamm:248750
MSB:Mamm:248770
MSB:Mamm:248771
MSB:Mamm:248774
MSB:Mamm:248777
MSB:Mamm:248790
MSB:Mamm:248798
MSB:Mamm:248799
MSB:Mamm:248800
MSB:Mamm:248801
MSB:Mamm:248803
MSB:Mamm:248989
MSB:Mamm:248996
MSB:Mamm:248998
MSB:Mamm:249005
MSB:Mamm:263731
MSB:Mamm:267294
MSB:Mamm:267296
MSB:Mamm:267298
MSB:Mamm:267300
MSB:Mamm:267302
MSB:Mamm:267935
MSB:Mamm:267998
MSB:Mamm:267999
MSB:Mamm:268026
MSB:Mamm:268085
MSB:Mamm:268094
MSB:Mamm:268986
MSB:Mamm:270035
MSB:Mamm:270036
MSB:Mamm:270038
MSB:Mamm:270066
MSB:Mamm:270069
MSB:Mamm:280869
MSB:Mamm:284148
MSB:Mamm:284929
MSB:Mamm:284932
MSB:Mamm:284944
MSB:Mamm:284964
MSB:Mamm:291997
MSB:Mamm:292052
MSB:Mamm:292053
MSB:Mamm:292056
MSB:Mamm:292057
MSB:Mamm:292061
MSB:Mamm:292063
MSB:Mamm:292064
MSB:Mamm:292066
MSB:Mamm:292067
MSB:Mamm:292068
MSB:Mamm:292069
MSB:Mamm:292070
MSB:Mamm:292078
MSB:Mamm:292082
MSB:Mamm:292083
MSB:Mamm:292084
MSB:Mamm:292085
MSB:Mamm:292090
MSB:Mamm:292091
MSB:Mamm:292121
MSB:Mamm:292122
MSB:Mamm:292123

171 rows selected.
campmlc commented 6 years ago

Some comments on the test interface: 1) Specimens that only have one event should automatically be linked to that event, correct? This one is not: http://arctos-test.tacc.utexas.edu/guid/MSB:Para:20609 2) Specimens with multiple events that have the same info in part remarks and in event remarks should be linkable with a script at some point, so we don't have to do this manually? e.g. http://arctos-test.tacc.utexas.edu/guid/MSB:Mamm:157068

3) For the multi-event record above, I have the following recommendations: a. please make the font smaller or adjust the popup that appears when clicking "pick event' so that all of the event info is visible in one screen without having to scroll over far to the right to access the remarks and link button. b. It is going to be really painful to do this one at a time - again, we need a good script for when it is possible to use one. c. Please change the highlight color on highlight linked components - the current highlight is not obvious. d. Any way we can choose to change how our parts are displayed so that we have all parts from a single event together, rather than by part type?

dustymc commented 6 years ago

Specimens that only have one event should automatically be linked to that event, correct? This one is not: http://arctos-test.tacc.utexas.edu/guid/MSB:Para:20609

We can discuss, but I don't think so - it adds some unnecessary complexity and doesn't DO anything. This is a new EXTRA link, not a new pathway. We may make the link implicitly for various reasons (exporting DWC).

Specimens with multiple events that have the same info in part remarks and in event remarks should be linkable with a script at some point, so we don't have to do this manually? e.g. http://arctos-test.tacc.utexas.edu/guid/MSB:Mamm:157068

https://github.com/ArctosDB/arctos/issues/1545#issuecomment-413700989

I may be able to make more links, but that should be sufficient for testing.

following recommendations

https://github.com/ArctosDB/arctos/issues/1545#issuecomment-413700989

I think we're looking for "functional" rather than polished at this point.

I think the order of priorities should be about...

1) Does this DO what you need; is it big-picture useful? (There should be enough to answer that now.) 2) Can we use it (eg, for GGBN)? (I think so, but I'd like to play with that a bit more before we get too deep.) 3) refinements - recover existing links from remarks-and-such, shuffle things around on the screen, etc. 4) new data - how's this work from anywhere except specimendetail?

campmlc commented 6 years ago

To answer #1) and 2) does this do what we need and can we use it for GGBN - can we see a demo of what one of these linked occurrences would look like in the GGBN export, with http://arctos-test.tacc.utexas.edu/guid/MSB:Mamm:157068 as the example? This one has some linked parts to event 1, some linked parts to event 2, and some unlinked parts in test.

dustymc commented 6 years ago

I still think this needs to be more sequential - does it do whatever you're trying to do with the part attributes and remarks and accn and whatever else?

Assuming that answer is yes....

If that all works, linked parts will (eventually, somehow) go out with their "occurrence." Unlinked parts will, lacking better ideas, go out with the "priority" specimen event - https://github.com/ArctosDB/DDL/blob/master/functions/getPrioritySpecimenEvent.sql. http://arctos-test.tacc.utexas.edu/guid/MSB:Mamm:157068 would go out (to GBIF-and-such) as two Occurrences:

These are the asserted links:


UAM@ARCTOSTE> select specimen_event_id, part_id  from specimen_event_links where collection_object_id=2791376;

SPECIMEN_EVENT_ID    PART_ID
----------------- ----------
      3232421   26094176
      3232421   26094250
      3232421   26094124
      3232421   26094334
        54243   21967548
        54243    2791380

and unlinked parts...


UAM@ARCTOSTE> select specimen_part.COLLECTION_OBJECT_ID from specimen_part where derived_from_cat_item=2791376 and COLLECTION_OBJECT_ID not in (select part_id from specimen_event_links where COLLECTION_OBJECT_ID=2791376) ;

COLLECTION_OBJECT_ID
--------------------
        26094514
        21967547
        26094000
        26093729
        26093858
        26094559
        26094387
        26094459
         2791381
        25901180
         2791379
        26093938
         2791378
         2791377
        26093786
        26094750
        26094693
        26094598
        26094641

19 rows selected.

would get lumped in with the "priority" event, which is


UAM@ARCTOSTE> select getPrioritySpecimenEvent(2791376) from dual;

GETPRIORITYSPECIMENEVENT(2791376)
---------------------------------
              3232421

I think that'll work the same way it does now, where parts are smooshed into a string.


UAM@ARCTOSTE> select parts from flat where collection_object_id=2791376;

PARTS
------------------------------------------------------------------------------------------------------------------------
postcranial skeleton; skull; skin; blood (EDTA); blood (EDTA); blood (EDTA); blood (EDTA); blood (EDTA); kidney (frozen)
; blood (EDTA); blood (EDTA); blood (EDTA); blood (EDTA); blood (frozen); blood (frozen); blood (frozen); blood serum (f
rozen); blood serum (frozen); muscle (frozen); heart (frozen); blood (EDTA); blood (EDTA); liver (frozen); blood (EDTA)

In this case one "parts" value will be short (2 parts) and the other long (23 parts).

For GGBN, which...

GGBN expects to have one MaterialSample per record in its Darwin Core Occurrence archives

each part will define an "Occurrence" (which won't be an Occurrence at all, but we're stuck with the vocabulary).

select 
    specimen_part.COLLECTION_OBJECT_ID,
    nvl(
        specimen_event_links.specimen_event_id,
        getPrioritySpecimenEvent(specimen_part.derived_from_cat_item)
    ) pieceOfTheOccurrenceID
from 
    specimen_part,
    specimen_event_links 
where 
    specimen_part.COLLECTION_OBJECT_ID=specimen_event_links.part_ID (+) and
     specimen_part.derived_from_cat_item=2791376
;
COLLECTION_OBJECT_ID PIECEOFTHEOCCURRENCEID
-------------------- ----------------------
        26094176            3232421
        26094250            3232421
        26094124            3232421
        26094334            3232421
        21967548              54243
         2791380              54243
        26094514            3232421
        21967547            3232421
        26094000            3232421
        26093729            3232421
        26093858            3232421
        26094559            3232421
        26094387            3232421
        26094459            3232421
         2791381            3232421
        25901180            3232421
         2791379            3232421
        26093938            3232421
         2791378            3232421
         2791377            3232421
        26093786            3232421
        26094750            3232421
        26094693            3232421
        26094598            3232421
        26094641            3232421

25 rows selected.

so that specimen would have 25 "Occurrences" each with one part until GGBN can support one Occurrence having multiple MaterialSample "children." (Or I suppose we could send out a single part - "kidney (frozen) + liver (frozen) + ..." - or something, but that would be contrary to GGBN's part-centric outlook. In any case SOMETHING has to be denormalized to support their model.)

campmlc commented 6 years ago

This sounds like it will do what we are looking for, e.g. assigning parts to events as "occurrences" for GGBN, but it is hard to know until we see it go into effect and see what GGBN (and iDigBio and GBIF) do with the information.

Once this goes into effect, can you send a list of all multi-event cataloged items that have one or more unlinked part, by collection? Then we can start either designing scripts to clean those up or tackling them one by one.

On Mon, Aug 27, 2018 at 12:36 PM, dustymc notifications@github.com wrote:

I still think this needs to be more sequential - does it do whatever you're trying to do with the part attributes and remarks and accn and whatever else?

Assuming that answer is yes....

If that all works, linked parts will (eventually, somehow) go out with their "occurrence." Unlinked parts will, lacking better ideas, go out with the "priority" specimen event - https://github.com/ArctosDB/ DDL/blob/master/functions/getPrioritySpecimenEvent.sql. http://arctos-test.tacc.utexas.edu/guid/MSB:Mamm:157068 would go out (to GBIF-and-such) as two Occurrences:

These are the asserted links:

UAM@ARCTOSTE> select specimen_event_id, part_id from specimen_event_links where collection_object_id=2791376;

SPECIMEN_EVENT_ID PART_ID


3232421   26094176
3232421   26094250
3232421   26094124
3232421   26094334
  54243   21967548
  54243    2791380

and unlinked parts...

UAM@ARCTOSTE> select specimen_part.COLLECTION_OBJECT_ID from specimen_part where derived_from_cat_item=2791376 and COLLECTION_OBJECT_ID not in (select part_id from specimen_event_links where COLLECTION_OBJECT_ID=2791376) ;

COLLECTION_OBJECT_ID

  26094514
  21967547
  26094000
  26093729
  26093858
  26094559
  26094387
  26094459
   2791381
  25901180
   2791379
  26093938
   2791378
   2791377
  26093786
  26094750
  26094693
  26094598
  26094641

19 rows selected.

would get lumped in with the "priority" event, which is

UAM@ARCTOSTE> select getPrioritySpecimenEvent(2791376) from dual;

GETPRIORITYSPECIMENEVENT(2791376)

        3232421

I think that'll work the same way it does now, where parts are smooshed into a string.

UAM@ARCTOSTE> select parts from flat where collection_object_id=2791376;

PARTS

postcranial skeleton; skull; skin; blood (EDTA); blood (EDTA); blood (EDTA); blood (EDTA); blood (EDTA); kidney (frozen) ; blood (EDTA); blood (EDTA); blood (EDTA); blood (EDTA); blood (frozen); blood (frozen); blood (frozen); blood serum (f rozen); blood serum (frozen); muscle (frozen); heart (frozen); blood (EDTA); blood (EDTA); liver (frozen); blood (EDTA)

In this case one "parts" value will be short (2 parts) and the other long (23 parts).

For GGBN, which...

GGBN expects to have one MaterialSample per record in its Darwin Core Occurrence archives

each part will define an "Occurrence" (which won't be an Occurrence at all, but we're stuck with the vocabulary).

select specimen_part.COLLECTION_OBJECT_ID, nvl( specimen_event_links.specimen_event_id, getPrioritySpecimenEvent(specimen_part.derived_from_cat_item) ) pieceOfTheOccurrenceID from specimen_part, specimen_event_links where specimen_part.COLLECTION_OBJECT_ID=specimen_event_links.part_ID (+) and specimen_part.derived_from_cat_item=2791376 ; COLLECTION_OBJECT_ID PIECEOFTHEOCCURRENCEID


  26094176            3232421
  26094250            3232421
  26094124            3232421
  26094334            3232421
  21967548              54243
   2791380              54243
  26094514            3232421
  21967547            3232421
  26094000            3232421
  26093729            3232421
  26093858            3232421
  26094559            3232421
  26094387            3232421
  26094459            3232421
   2791381            3232421
  25901180            3232421
   2791379            3232421
  26093938            3232421
   2791378            3232421
   2791377            3232421
  26093786            3232421
  26094750            3232421
  26094693            3232421
  26094598            3232421
  26094641            3232421

25 rows selected.

so that specimen would have 25 "Occurrences" each with one part until GGBN can support one Occurrence having multiple MaterialSample "children." (Or I suppose we could send out a single part - "kidney (frozen) + liver (frozen)

  • ..." - or something, but that would be contrary to GGBN's part-centric outlook. In any case SOMETHING has to be denormalized to support their model.)

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1545#issuecomment-416324688, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hK5B0vQ8w_JTwaBDxR2JFagODZb9ks5uVDw1gaJpZM4UT5xn .

dustymc commented 6 years ago

send a list

create table temp_multi_evt as select collection_object_id from specimen_event where verificationstatus!='unaccepted' having count(*) > 1 group by collection_object_id;
alter table temp_multi_evt add num_parts number;
update temp_multi_evt set num_parts=(select count(*) from specimen_part where specimen_part.derived_from_cat_item=temp_multi_evt.collection_object_id);
alter table temp_multi_evt add num_linked_parts number;
update temp_multi_evt set num_linked_parts=(select count(*) from specimen_event_links where specimen_event_links.collection_object_id=temp_multi_evt.collection_object_id);
alter table temp_multi_evt add guid varchar2(255);
update temp_multi_evt set guid=(select guid from flat where flat.collection_object_id=temp_multi_evt.collection_object_id);
-- get rid of some stuff we know 
delete from temp_multi_evt where guid like 'UAM:EH%';
delete from temp_multi_evt where guid like 'UAMb:Herb:%';
select substr(guid,1,instr(guid,':',1,2)) || ' @ ' || count(*) from temp_multi_evt group by  substr(guid,1,instr(guid,':',1,2));
SUBSTR(GUID,1,INSTR(GUID,':',1,2))||'@'||COUNT(*)
------------------------------------------------------------------------------------------------------------------------
MSB:Mamm: @ 6244
UTEP:Herb: @ 123
CHAS:Bird: @ 11024
DMNS:Inv: @ 7
UTEP:Herp: @ 1318
UWBM:Herp: @ 21
MVZ:Mamm: @ 5
UAMObs:Ento: @ 1
KWP:Ento: @ 3
UAM:Ento: @ 2
UCM:Fish: @ 1279
UAM:Mamm: @ 1
MSB:Bird: @ 2
MVZ:Bird: @ 5
CHAS:Egg: @ 474
MVZ:Herp: @ 1
UTEP:HerpOS: @ 139
DMNS:Bird: @ 8
CHAS:Mamm: @ 1
DMNS:Mamm: @ 4
UTEP:Inv: @ 1121

temp_multi_evt.csv.zip

scripts

I think I've recovered what I can from date in part remarks. Let me know (examples are useful) what else might be used as a link and I'll script what I can.

iDigBio and GBIF

MAYBE we'll eventually send them slightly more accurate part concatenations, but nothing else will change.

Nothing about the format of what we're sending to GGBN will change, the event data will just be a bit better-targeted for those parts which have explicit links.

campmlc commented 6 years ago

NK number should also be a link between part remarks or part label and the specimen event remarks for many of these.

On Tue, Aug 28, 2018 at 8:41 AM, dustymc notifications@github.com wrote:

send a list

create table temp_multi_evt as select collection_object_id from specimen_event where verificationstatus!='unaccepted' having count() > 1 group by collection_object_id; alter table temp_multi_evt add num_parts number; update temp_multi_evt set num_parts=(select count() from specimen_part where specimen_part.derived_from_cat_item=temp_multi_evt.collection_object_id); alter table temp_multi_evt add num_linked_parts number; update temp_multi_evt set num_linked_parts=(select count() from specimen_event_links where specimen_event_links.collection_object_id=temp_multi_evt.collection_object_id); alter table temp_multi_evt add guid varchar2(255); update temp_multi_evt set guid=(select guid from flat where flat.collection_object_id=temp_multi_evt.collection_object_id); -- get rid of some stuff we know delete from temp_multi_evt where guid like 'UAM:EH%'; delete from temp_multi_evt where guid like 'UAMb:Herb:%'; select substr(guid,1,instr(guid,':',1,2)) || ' @ ' || count() from temp_multi_evt group by substr(guid,1,instr(guid,':',1,2)); SUBSTR(GUID,1,INSTR(GUID,':',1,2))||'@'||COUNT(*)

MSB:Mamm: @ 6244 UTEP:Herb: @ 123 CHAS:Bird: @ 11024 DMNS:Inv: @ 7 UTEP:Herp: @ 1318 UWBM:Herp: @ 21 MVZ:Mamm: @ 5 UAMObs:Ento: @ 1 KWP:Ento: @ 3 UAM:Ento: @ 2 UCM:Fish: @ 1279 UAM:Mamm: @ 1 MSB:Bird: @ 2 MVZ:Bird: @ 5 CHAS:Egg: @ 474 MVZ:Herp: @ 1 UTEP:HerpOS: @ 139 DMNS:Bird: @ 8 CHAS:Mamm: @ 1 DMNS:Mamm: @ 4 UTEP:Inv: @ 1121

temp_multi_evt.csv.zip https://github.com/ArctosDB/arctos/files/2328479/temp_multi_evt.csv.zip

scripts

I think I've recovered what I can from date in part remarks. Let me know (examples are useful) what else might be used as a link and I'll script what I can.

iDigBio and GBIF

MAYBE we'll eventually send them slightly more accurate part concatenations, but nothing else will change.

Nothing about the format of what we're sending to GGBN will change, the event data will just be a bit better-targeted for those parts which have explicit links.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1545#issuecomment-416611124, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hOEO7vDfuHCosDvAGHYhcDr9v_jbks5uVVasgaJpZM4UT5xn .

dustymc commented 6 years ago

NK

Example?

dustymc commented 6 years ago

There is a new table digir_query.msb_mamm_ggbn_tissue_tbl containing MSB:Mamm tissue not-really-Occurrences for GGBN.

DDL is https://github.com/ArctosDB/DDL/blob/master/flat/ggbn_flat_tissue.sql


UAM@ARCTOS> select count(*) from digir_query.msb_mamm_ggbn_tissue_tbl;

  COUNT(*)
----------
    482101

1 row selected.

Elapsed: 00:00:00.79
UAM@ARCTOS> select occurrenceID from digir_query.msb_mamm_ggbn_tissue_tbl having count(*) > 1 group by occurrenceID;

no rows selected

Elapsed: 00:00:01.07
UAM@ARCTOS> select occurrenceID from digir_query.msb_mamm_ggbn_tissue_tbl where rownum<5;

OCCURRENCEID
------------------------------------------------------------------------------------------------------------------------
http://arctos.database.museum/guid/MSB:Mamm:69674?pid=27256939
http://arctos.database.museum/guid/MSB:Mamm:196496?pid=21375444
http://arctos.database.museum/guid/MSB:Mamm:85721?pid=2185666
http://arctos.database.museum/guid/MSB:Mamm:53231?pid=2140369

4 rows selected.

UAM@ARCTOS> select occurrenceID,preparationType,eventDate from digir_query.msb_mamm_ggbn_tissue_tbl where references='http://arctos.database.museum/guid/MSB:Mamm:292063' order by eventDate
  2  ;

OCCURRENCEID
------------------------------------------------------------------------------------------------------------------------
PREPARATIONTYPE
------------------------------------------------------------------------------------------------------------------------
EVENTDATE
------------------------------------------------------------------------------------------------------------------------
http://arctos.database.museum/guid/MSB:Mamm:292063?pid=25994537
blood (EDTA)
2014-06-15

http://arctos.database.museum/guid/MSB:Mamm:292063?pid=25994538
blood serum (frozen)
2014-06-15

http://arctos.database.museum/guid/MSB:Mamm:292063?pid=25994545
blood (EDTA)
2014-06-30

http://arctos.database.museum/guid/MSB:Mamm:292063?pid=25994541
blood (EDTA)
2014-06-30

http://arctos.database.museum/guid/MSB:Mamm:292063?pid=25994542
blood (EDTA)
2014-06-30

http://arctos.database.museum/guid/MSB:Mamm:292063?pid=27923980
blood (EDTA)
2014-06-30

http://arctos.database.museum/guid/MSB:Mamm:292063?pid=25994546
blood serum (frozen)
2014-06-30

http://arctos.database.museum/guid/MSB:Mamm:292063?pid=25994544
blood (EDTA)
2014-06-30

http://arctos.database.museum/guid/MSB:Mamm:292063?pid=25994543
blood (EDTA)
2014-06-30

http://arctos.database.museum/guid/MSB:Mamm:292063?pid=25994547
blood (EDTA)
2014-07-21

http://arctos.database.museum/guid/MSB:Mamm:292063?pid=25994550
blood (EDTA)
2014-07-21

http://arctos.database.museum/guid/MSB:Mamm:292063?pid=25994549
blood (EDTA)
2014-07-21

http://arctos.database.museum/guid/MSB:Mamm:292063?pid=25994551
blood (EDTA)
2014-07-21

http://arctos.database.museum/guid/MSB:Mamm:292063?pid=25994548
blood (EDTA)
2014-07-21

http://arctos.database.museum/guid/MSB:Mamm:292063?pid=25994552
blood serum (frozen)
2014-07-21

http://arctos.database.museum/guid/MSB:Mamm:292063?pid=25988964
blood serum (frozen)
2014-10-19

http://arctos.database.museum/guid/MSB:Mamm:292063?pid=26966404
blood (EDTA)
2014-10-19

http://arctos.database.museum/guid/MSB:Mamm:292063?pid=25988963
blood (EDTA)
2014-10-19

http://arctos.database.museum/guid/MSB:Mamm:292063?pid=25988965
blood serum (frozen)
2014-10-19

19 rows selected.
dustymc commented 6 years ago

I found a bunch more by NK, but I also found a bunch with conflicting data, attached.

PID: partID BARCODE: what it sounds like PART_REMARK: what it sounds like EID1: linked event (from dates) EID2: event that NK suggests should be linked BDn: began_date from event SE_REMARKn: event remarks

temp_multi_link.csv.zip

campmlc commented 6 years ago

Thanks, I'll take a look at those.

On Wed, Aug 29, 2018 at 10:53 AM, dustymc notifications@github.com wrote:

I found a bunch more by NK, but I also found a bunch with conflicting data, attached.

PID: partID BARCODE: what it sounds like PART_REMARK: what it sounds like EID1: linked event (from dates) EID2: event that NK suggests should be linked BDn: began_date from event SE_REMARKn: event remarks

temp_multi_link.csv.zip https://github.com/ArctosDB/arctos/files/2332996/temp_multi_link.csv.zip

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1545#issuecomment-417025508, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hE4xY77UiKIfYl2RR45Guj1ntQ6Gks5uVscQgaJpZM4UT5xn .

dustymc commented 6 years ago

Can this be closed?

tucotuco commented 6 years ago

Maybe let me make a GGBN resource using that view first before closing?

campmlc commented 6 years ago

Yes, agree. We also need to see what iDigBio and GBIF do with it.

On Thu, Sep 6, 2018 at 2:12 PM, John Wieczorek notifications@github.com wrote:

Maybe let me make a GGBN resource using that view first before closing?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1545#issuecomment-419225486, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hHilnqbHU2U5ps51vsnkT3qWPaibks5uYYGYgaJpZM4UT5xn .

tucotuco commented 6 years ago

They shouldn't be doing anything with it, should they?

On Fri, Sep 7, 2018 at 8:13 AM Mariel Campbell notifications@github.com wrote:

Yes, agree. We also need to see what iDigBio and GBIF do with it.

On Thu, Sep 6, 2018 at 2:12 PM, John Wieczorek notifications@github.com wrote:

Maybe let me make a GGBN resource using that view first before closing?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1545#issuecomment-419225486, or mute the thread < https://github.com/notifications/unsubscribe-auth/AOH0hHilnqbHU2U5ps51vsnkT3qWPaibks5uYYGYgaJpZM4UT5xn

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1545#issuecomment-419225750, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcP63kZiaYvm5p4WbMKB7dsBqd7rgB4ks5uYYHRgaJpZM4UT5xn .

campmlc commented 6 years ago

It shouldn't make a difference, right? Just want to confirm that. But the main thing is to see how it works for GGBN.

On Thu, Sep 6, 2018 at 2:14 PM, John Wieczorek notifications@github.com wrote:

They shouldn't be doing anything with it, should they?

On Fri, Sep 7, 2018 at 8:13 AM Mariel Campbell notifications@github.com wrote:

Yes, agree. We also need to see what iDigBio and GBIF do with it.

On Thu, Sep 6, 2018 at 2:12 PM, John Wieczorek <notifications@github.com

wrote:

Maybe let me make a GGBN resource using that view first before closing?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <https://github.com/ArctosDB/arctos/issues/1545#issuecomment-419225486 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ AOH0hHilnqbHU2U5ps51vsnkT3qWPaibks5uYYGYgaJpZM4UT5xn

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1545#issuecomment-419225750, or mute the thread https://github.com/notifications/unsubscribe-auth/ AAcP63kZiaYvm5p4WbMKB7dsBqd7rgB4ks5uYYHRgaJpZM4UT5xn .

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1545#issuecomment-419226142, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hN8TVsaAGLCkBKiCV2tNHGjgV9tYks5uYYIhgaJpZM4UT5xn .

dustymc commented 6 years ago

Wilco.

This is for GGBN and only GGBN. It's vastly different than what we're sending to everyone else.

campmlc commented 6 years ago

The process of linking parts to specimen events is working, but it is fairly complicated and needs some help to be more functional and user friendly. Specifically, in the specimen record, we need the specimen events to display in order by date, and we need the parts to display in the same order, at least within part type. In the current system, events seem to be in order of when they were created (?), which for this type of legacy data where multiple records are being consolidated, results in random order by date. Then, linking to parts that also appear to end up, within part type, in random order, becomes very difficult. Can we have events and parts present in some consistent ordering scheme? I would prefer most recent event first, with the associated parts in the same order. This would make the process of finding and linking associated parts and events much easier and more intelligible. As a corollary to this, identifiers also appear to get added in some random order. For these multiple events, parts from each event have an shared unique identifier (NK). Currently, these display in somewhat random order - it would be helpful to have them display in numeric order, smallest to largest, in the identifiers box of the specimen record. This also applies to all Arctos pages and displays in general, including object tracking. How exactly does Oracle decide to order things, and can we choose to have a consistent order throughout?

http://arctos.database.museum/guid/MSB:Mamm:306166

campmlc commented 6 years ago

To make things event more interesting, in turns out that specimens display in a different order when you click "edit" than they do on the main display page. This is problematic for entering any edits to the correct event.

dustymc commented 5 years ago

@campmlc it's 2 clicks please elaborate on "complicated."

There is no date associated with parts. I can't sort by things that don't exist.

Events are sorted by a complicated thing that considers verificationstatus and type and such. I'm not sure if there's enough information to replicate that in the pick, but I'll check.

Sorting identifiers could be an Issue, but likely needs linguistic indexes.

Oracle does not order things. We can request sorting anywhere, but it's not free. That is also a new Issue.

Can this be closed?

campmlc commented 5 years ago

We should wait to close this until the GGBN resource is published. John, what the status on that?

And yes, disregard the GBIF question. This is a separate resource.

On Wed, Feb 13, 2019, 8:41 AM dustymc <notifications@github.com wrote:

@campmlc https://github.com/campmlc it's 2 clicks please elaborate on "complicated."

There is no date associated with parts. I can't sort by things that don't exist.

Events are sorted by a complicated thing that considers verificationstatus and type and such. I'm not sure if there's enough information to replicate that in the pick, but I'll check.

Sorting identifiers could be an Issue, but likely needs linguistic indexes.

Oracle does not order things. We can request sorting anywhere, but it's not free. That is also a new Issue.

Can this be closed?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1545#issuecomment-463247516, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hByIkuUlhhnLhPH0mh3pFMds51tRks5vNDIhgaJpZM4UT5xn .

tucotuco commented 5 years ago

I have a test resource set up, but I am not getting the data I need out of Arctos with the query "select * from digir_query.msb_mamm_ggbn_tissue_tbl". I'll work with Dusty to get that straight. The first GGBN resource will be for MSB Mammals that have one tissue per specimen. Let's get that right and see what we can convince GGBN of after that. Hopefully we can have a single query over all Arctos for GGBN, and distinguish resources by their collection_id in the resource SELECT statement.

One big question is what we want to do about metadata for these GGBN-only resources. Should we copy the metadata from the regular collection's metadata and modify that in some way?

On Wed, Feb 13, 2019 at 12:59 PM Mariel Campbell notifications@github.com wrote:

We should wait to close this until the GGBN resource is published. John, what the status on that?

And yes, disregard the GBIF question. This is a separate resource.

On Wed, Feb 13, 2019, 8:41 AM dustymc <notifications@github.com wrote:

@campmlc https://github.com/campmlc it's 2 clicks please elaborate on "complicated."

There is no date associated with parts. I can't sort by things that don't exist.

Events are sorted by a complicated thing that considers verificationstatus and type and such. I'm not sure if there's enough information to replicate that in the pick, but I'll check.

Sorting identifiers could be an Issue, but likely needs linguistic indexes.

Oracle does not order things. We can request sorting anywhere, but it's not free. That is also a new Issue.

Can this be closed?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1545#issuecomment-463247516, or mute the thread < https://github.com/notifications/unsubscribe-auth/AOH0hByIkuUlhhnLhPH0mh3pFMds51tRks5vNDIhgaJpZM4UT5xn

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1545#issuecomment-463255137, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcP6x69RgAgwSzu6XW7cd6HXIvnvCwZks5vNDZ3gaJpZM4UT5xn .

tucotuco commented 5 years ago

Test resource now ready for testing at http://ipt.vertnet.org:8080/ipt/resource.do?r=msbmammalggbntest

campmlc commented 5 years ago

Thank you, John! What do we need to do to test this?

On Wed, Mar 13, 2019 at 7:38 AM John Wieczorek notifications@github.com wrote:

Test resource now ready for testing at http://ipt.vertnet.org:8080/ipt/resource.do?r=msbmammalggbntest

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1545#issuecomment-472425287, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hG8izqBJ31yKsZ7Prq0uW1DTtJ7wks5vWP9egaJpZM4UT5xn .

tucotuco commented 5 years ago

Gabi is on this now.

On Wed, Mar 13, 2019 at 12:16 PM Mariel Campbell notifications@github.com wrote:

Thank you, John! What do we need to do to test this?

On Wed, Mar 13, 2019 at 7:38 AM John Wieczorek notifications@github.com wrote:

Test resource now ready for testing at http://ipt.vertnet.org:8080/ipt/resource.do?r=msbmammalggbntest

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1545#issuecomment-472425287, or mute the thread < https://github.com/notifications/unsubscribe-auth/AOH0hG8izqBJ31yKsZ7Prq0uW1DTtJ7wks5vWP9egaJpZM4UT5xn

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1545#issuecomment-472466292, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcP61Jzx54xEzglMbrkex8k7pXkYCPJks5vWRY-gaJpZM4UT5xn .

dustymc commented 5 years ago

Can this be closed?

tucotuco commented 5 years ago

Yes, this can be closed.

On Tue, Apr 2, 2019 at 12:38 PM dustymc notifications@github.com wrote:

Can this be closed?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1545#issuecomment-479056769, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcP6z8-chU-CZkXWkkttfXc8oYbHceAks5vc3l8gaJpZM4UT5xn .