Closed campmlc closed 5 years ago
Discussion resolution:
Add a table linking OUT from specimen_events to "event stuff."
Optional - does not require any additional actions.
When available, link a specimen event to a part (for now - ability to add more stuff later should not be a problem).
Need:
Long-term: look at elevating catalog number (as "individual ID") to encompass multiple "occurrences" (current cataloged_item model, more or less).
Here is a summary of the basic issues that arose in trying to publish records of samples from Arctos to GGBN.
The GGBN data publishing model uses Darwin Core Archives with a core file consisting of Darwin Core Occurrences (dwc:Occurrence instances), with additional files (extensions) that connect records about preparations, permits, etc. to those core dwc:Occurrences through dwc:occurrenceIDs. Following the rules of Darwin Core Archives, each dwc:occurrenceID in the core file must correspond to exactly one record in that file - they must be unique in that file. Records in extension files must also have dwc:occurrenceIDs that match one record in the core file, though at times there may be many records in an extension that correspond to a core record (a one-to many relationship between the core and the extension resulting in a star schema). The Darwin Core Archive model with an Occurrence Core is the same as used by GBIF to publish dwc:Occurrences for specimens and observations of all kinds (PreservedSpecimens, FossilSpecimens, HumanObservations, MachineObservations, etc.).
The definition of a dwc:Occurrence is "An existence of an Organism (sensu http://rs.tdwg.org/dwc/terms/Organism) at a particular place at a particular time." Another way to say this is "The presence of a dwc:Organism during a dwc:Event". Aside from a problem of semantics, the GGBN model presents no readily apparent problem in the simple case in which one dwc:Organism yields one dwc:MaterialSample (defined as "A physical result of a sampling (or subsampling) event. In biological collections, the material sample is typically collected, and either preserved or destructively processed." Examples: A whole organism preserved in a collection. A part of an organism isolated for some purpose. A soil sample. A marine microbial sample.).
The problem is that GGBN expects to have one MaterialSample per record in its Darwin Core Occurrence archives. The semantic conundrum is that a dwc:MaterialSample can serve as physical evidence of a dwc:Occurrence, but it is not a narrower case of a dwc:Occurrence, because an Occurrence refers to a dwc:Organism at a place and time, not a dwc:MaterialSample at a place and time. The real world isn't always as simple as one material sample per organism per event. Arctos, in particular, can and frequently does capture multiple instances of dwc:MaterialSamples taken from the same dwc:Organism, sometimes in a single dwc:Event (multiple samples taken at once), and sometimes in distinct dwc:Events (samples from the same organism taken at different times). The first case can not be captured in a GGBN Darwin Core Occurrence Core without conflating the meaning of a dwc:Occurrence. Since a dwc:Occurrence is the dwc:Organism during the dwc:Event, more than one material sample taken from that organism would share the same dwc:Occurrence, but the rules of the structure prohibit the duplication of the identifier for the dwc:Occurrence.
The only solutions to this problem are to artificially mint "occurrenceIDs" that represent a dwc:MaterialSample taken during a dwc:Event or to artificially use existing or minted dwc:materialSampleIDs as a proxy for the dwc:occurrenceIDs required in the Occurrence Core. Regardless of which solution is used, the GGBN model has two undesirable consequences. Foremost, it compels collections that support multiple samples per event to publish two distinct Darwin Core archives, one for GGBN with a dwc:MaterialSamples as Occurrences and another one with dwc:Occurrences as Occurrences for GBIF. Second, the meaning of an Occurrence in the GGBN archive can not be guaranteed to be semantically correct for every record.
A change to the GGBN model to overcome these deficiencies is highly recommended. A better model would be to be able to publish a single resource that satisfies both GBIF and GGBN, with all data for both in the same archive. Such a model is readily possible, but requires a trick to get around the restriction of a star schema for Darwin Core Archives.
On the Arctos side, there is two deficiencies that must be overcome to rigorously represent dwc:MaterialSamples in the way GGBN expects. The first is to be able to state unambiguously which dwc:Event a dwc:MaterialSample was generated from. This is currently known for the majority of material samples, but not for all. The second deficiency is that the Arctos material samples do not currently have global unique identifiers. We believe this can be fairly easily overcome by creating URLs as GUIDs that include references to both the cataloged item, the event and the material sample (e.g., http://arctos.database.museum/guid/MSB:Mamm:292063?seid=3175067&sampleID=3).
ALL of our OccurrenceIDs are artificial - that's just not what museums catalog, we make them up just for DWC, I don't much mind making even more.
The first is to be able to state unambiguously which dwc:Event a dwc:MaterialSample was generated from
In the one case where we know and don't properly record those data, we're working on it.
In the other two cases I don't see this as a defect, but rather critical functionality. We do provide event type and verification status, and the portals who don't want those data are welcome to reject them.
But....
Looking more inward, we do provide event type and verification status, but the data are not very good.
select
specimen_event.specimen_event_type,
specimen_event.verificationstatus
from
specimen_event,
flat
where
8 specimen_event.collection_object_id=flat.collection_object_id and guid='MSB:Mamm:292063';
SPECIMEN_EVENT_TYPE
------------------------------------------------------------------------------------------------------------------------
VERIFICATIONSTATUS
------------------------------------------------------------------------------------------------------------------------
collection
unverified
collection
unverified
collection
unverified
collection
unverified
collection
unverified
5 rows selected.
http://arctos.database.museum/info/ctDocumentation.cfm?table=CTSPECIMEN_EVENT_TYPE
collection | Specimen was collected. The specimen was killed, found dead, or removed from functional cultural, biological, ecological, or archeological context.
Those data are indistinguishable from eg http://arctos.database.museum/guid/UAMb:Herb:10002...
select
specimen_event.specimen_event_type,
specimen_event.verificationstatus
from
specimen_event,
flat
where
8 specimen_event.collection_object_id=flat.collection_object_id and guid='UAMb:Herb:10002';
SPECIMEN_EVENT_TYPE
------------------------------------------------------------------------------------------------------------------------
VERIFICATIONSTATUS
------------------------------------------------------------------------------------------------------------------------
collection
unverified
collection
unverified
...for which the assertion is "we got some data back from a TCN and it's not very trustworthy; we don't have the resources to accept or reject anything, so here's everything we have."
It is really critical to use http://arctos.database.museum/info/ctDocumentation.cfm?table=CTSPECIMEN_EVENT_TYPE and http://arctos.database.museum/info/ctDocumentation.cfm?table=CTVERIFICATIONSTATUS properly; I don't feel like we can push the idea of filtering to "DWC portals" until we're actually doing what we say we're doing.
There is some stuff to look at in test. I was able to recover events for 1660 parts under 171 specimens from remarks. I may be able to make more links, but that should be sufficient for testing. The GUIDs are below.
There's a new "Event" column in the parts grid at SpecimenDetail (the only place any of this is visible at the moment).
when there's no event-link it contains 'pickEvent' which opens a popup from which you can select from events associated with the specimen.
I could use some feedback regarding what you want to see in that popup (assuming we're on a useful trajectory here) - it's very wide for what it does.
For parts with linked events, there are a couple more options in the new column.
removeLink does just what it sounds like it should do.
linkedEvent:{specimen_event_id} highlights the event and the part you clicked on (and attempts to scroll some stuff).
There's also a new widget on the specimen event side
which highlights the event and all linked parts.
The new table is...
UAM@ARCTOSTE> desc specimen_event_links
Name Null? Type
----------------------------------------------------------------- -------- --------------------------------------------
SPECIMEN_EVENT_LINK_ID NOT NULL NUMBER
COLLECTION_OBJECT_ID NOT NULL NUMBER
SPECIMEN_EVENT_ID NOT NULL NUMBER
PART_ID NUMBER
and the code is written in anticipation of adding more links.
There are keys involved, so this is a hard link. You won't be able to delete linked specimen events or items which are linked to specimen events. There are no additional controls - you can change everything except the ID of the event when it's linked to parts.
The UI has very little in the way of error handling and such - let me know if you manage to break it, but I think we're looking for "functional" rather than polished at this point.
I'll need some time to adjust some maintenance scripts before this can go to production, so please review and let me know if this is a viable approach ASAP.
UAM@ARCTOSTE> select guid from flat where collection_object_id in (select collection_object_id from specimen_event_links) group by guid order by guid;
GUID
------------------------------------------------------------------------------------------------------------------------
MSB:Mamm:135366
MSB:Mamm:142639
MSB:Mamm:142640
MSB:Mamm:142641
MSB:Mamm:142838
MSB:Mamm:142853
MSB:Mamm:142854
MSB:Mamm:157019
MSB:Mamm:157024
MSB:Mamm:157069
MSB:Mamm:157079
MSB:Mamm:157088
MSB:Mamm:157769
MSB:Mamm:157770
MSB:Mamm:193587
MSB:Mamm:193629
MSB:Mamm:193640
MSB:Mamm:193643
MSB:Mamm:193644
MSB:Mamm:193645
MSB:Mamm:193648
MSB:Mamm:193649
MSB:Mamm:193680
MSB:Mamm:193682
MSB:Mamm:193693
MSB:Mamm:193695
MSB:Mamm:193703
MSB:Mamm:193706
MSB:Mamm:193707
MSB:Mamm:193708
MSB:Mamm:195195
MSB:Mamm:195228
MSB:Mamm:195271
MSB:Mamm:195332
MSB:Mamm:195348
MSB:Mamm:195565
MSB:Mamm:195571
MSB:Mamm:195591
MSB:Mamm:196205
MSB:Mamm:196209
MSB:Mamm:196517
MSB:Mamm:196518
MSB:Mamm:196519
MSB:Mamm:196521
MSB:Mamm:196523
MSB:Mamm:196524
MSB:Mamm:196547
MSB:Mamm:196574
MSB:Mamm:196962
MSB:Mamm:196963
MSB:Mamm:199861
MSB:Mamm:212071
MSB:Mamm:214500
MSB:Mamm:216148
MSB:Mamm:222076
MSB:Mamm:222960
MSB:Mamm:222974
MSB:Mamm:222980
MSB:Mamm:223038
MSB:Mamm:224370
MSB:Mamm:224371
MSB:Mamm:224537
MSB:Mamm:224544
MSB:Mamm:224545
MSB:Mamm:224595
MSB:Mamm:224769
MSB:Mamm:224885
MSB:Mamm:224891
MSB:Mamm:224941
MSB:Mamm:224943
MSB:Mamm:224965
MSB:Mamm:224982
MSB:Mamm:225053
MSB:Mamm:225054
MSB:Mamm:225056
MSB:Mamm:225081
MSB:Mamm:226038
MSB:Mamm:230725
MSB:Mamm:231360
MSB:Mamm:231361
MSB:Mamm:231362
MSB:Mamm:231382
MSB:Mamm:231385
MSB:Mamm:231404
MSB:Mamm:231697
MSB:Mamm:231700
MSB:Mamm:231706
MSB:Mamm:231708
MSB:Mamm:231709
MSB:Mamm:231715
MSB:Mamm:231719
MSB:Mamm:231720
MSB:Mamm:231722
MSB:Mamm:231727
MSB:Mamm:231730
MSB:Mamm:231732
MSB:Mamm:231734
MSB:Mamm:233252
MSB:Mamm:233264
MSB:Mamm:233270
MSB:Mamm:233272
MSB:Mamm:233279
MSB:Mamm:233283
MSB:Mamm:233286
MSB:Mamm:233347
MSB:Mamm:240934
MSB:Mamm:248110
MSB:Mamm:248111
MSB:Mamm:248113
MSB:Mamm:248750
MSB:Mamm:248770
MSB:Mamm:248771
MSB:Mamm:248774
MSB:Mamm:248777
MSB:Mamm:248790
MSB:Mamm:248798
MSB:Mamm:248799
MSB:Mamm:248800
MSB:Mamm:248801
MSB:Mamm:248803
MSB:Mamm:248989
MSB:Mamm:248996
MSB:Mamm:248998
MSB:Mamm:249005
MSB:Mamm:263731
MSB:Mamm:267294
MSB:Mamm:267296
MSB:Mamm:267298
MSB:Mamm:267300
MSB:Mamm:267302
MSB:Mamm:267935
MSB:Mamm:267998
MSB:Mamm:267999
MSB:Mamm:268026
MSB:Mamm:268085
MSB:Mamm:268094
MSB:Mamm:268986
MSB:Mamm:270035
MSB:Mamm:270036
MSB:Mamm:270038
MSB:Mamm:270066
MSB:Mamm:270069
MSB:Mamm:280869
MSB:Mamm:284148
MSB:Mamm:284929
MSB:Mamm:284932
MSB:Mamm:284944
MSB:Mamm:284964
MSB:Mamm:291997
MSB:Mamm:292052
MSB:Mamm:292053
MSB:Mamm:292056
MSB:Mamm:292057
MSB:Mamm:292061
MSB:Mamm:292063
MSB:Mamm:292064
MSB:Mamm:292066
MSB:Mamm:292067
MSB:Mamm:292068
MSB:Mamm:292069
MSB:Mamm:292070
MSB:Mamm:292078
MSB:Mamm:292082
MSB:Mamm:292083
MSB:Mamm:292084
MSB:Mamm:292085
MSB:Mamm:292090
MSB:Mamm:292091
MSB:Mamm:292121
MSB:Mamm:292122
MSB:Mamm:292123
171 rows selected.
Some comments on the test interface: 1) Specimens that only have one event should automatically be linked to that event, correct? This one is not: http://arctos-test.tacc.utexas.edu/guid/MSB:Para:20609 2) Specimens with multiple events that have the same info in part remarks and in event remarks should be linkable with a script at some point, so we don't have to do this manually? e.g. http://arctos-test.tacc.utexas.edu/guid/MSB:Mamm:157068
3) For the multi-event record above, I have the following recommendations: a. please make the font smaller or adjust the popup that appears when clicking "pick event' so that all of the event info is visible in one screen without having to scroll over far to the right to access the remarks and link button. b. It is going to be really painful to do this one at a time - again, we need a good script for when it is possible to use one. c. Please change the highlight color on highlight linked components - the current highlight is not obvious. d. Any way we can choose to change how our parts are displayed so that we have all parts from a single event together, rather than by part type?
Specimens that only have one event should automatically be linked to that event, correct? This one is not: http://arctos-test.tacc.utexas.edu/guid/MSB:Para:20609
We can discuss, but I don't think so - it adds some unnecessary complexity and doesn't DO anything. This is a new EXTRA link, not a new pathway. We may make the link implicitly for various reasons (exporting DWC).
Specimens with multiple events that have the same info in part remarks and in event remarks should be linkable with a script at some point, so we don't have to do this manually? e.g. http://arctos-test.tacc.utexas.edu/guid/MSB:Mamm:157068
https://github.com/ArctosDB/arctos/issues/1545#issuecomment-413700989
I may be able to make more links, but that should be sufficient for testing.
following recommendations
https://github.com/ArctosDB/arctos/issues/1545#issuecomment-413700989
I think we're looking for "functional" rather than polished at this point.
I think the order of priorities should be about...
1) Does this DO what you need; is it big-picture useful? (There should be enough to answer that now.) 2) Can we use it (eg, for GGBN)? (I think so, but I'd like to play with that a bit more before we get too deep.) 3) refinements - recover existing links from remarks-and-such, shuffle things around on the screen, etc. 4) new data - how's this work from anywhere except specimendetail?
To answer #1) and 2) does this do what we need and can we use it for GGBN - can we see a demo of what one of these linked occurrences would look like in the GGBN export, with http://arctos-test.tacc.utexas.edu/guid/MSB:Mamm:157068 as the example? This one has some linked parts to event 1, some linked parts to event 2, and some unlinked parts in test.
I still think this needs to be more sequential - does it do whatever you're trying to do with the part attributes and remarks and accn and whatever else?
Assuming that answer is yes....
If that all works, linked parts will (eventually, somehow) go out with their "occurrence." Unlinked parts will, lacking better ideas, go out with the "priority" specimen event - https://github.com/ArctosDB/DDL/blob/master/functions/getPrioritySpecimenEvent.sql. http://arctos-test.tacc.utexas.edu/guid/MSB:Mamm:157068 would go out (to GBIF-and-such) as two Occurrences:
These are the asserted links:
UAM@ARCTOSTE> select specimen_event_id, part_id from specimen_event_links where collection_object_id=2791376;
SPECIMEN_EVENT_ID PART_ID
----------------- ----------
3232421 26094176
3232421 26094250
3232421 26094124
3232421 26094334
54243 21967548
54243 2791380
and unlinked parts...
UAM@ARCTOSTE> select specimen_part.COLLECTION_OBJECT_ID from specimen_part where derived_from_cat_item=2791376 and COLLECTION_OBJECT_ID not in (select part_id from specimen_event_links where COLLECTION_OBJECT_ID=2791376) ;
COLLECTION_OBJECT_ID
--------------------
26094514
21967547
26094000
26093729
26093858
26094559
26094387
26094459
2791381
25901180
2791379
26093938
2791378
2791377
26093786
26094750
26094693
26094598
26094641
19 rows selected.
would get lumped in with the "priority" event, which is
UAM@ARCTOSTE> select getPrioritySpecimenEvent(2791376) from dual;
GETPRIORITYSPECIMENEVENT(2791376)
---------------------------------
3232421
I think that'll work the same way it does now, where parts are smooshed into a string.
UAM@ARCTOSTE> select parts from flat where collection_object_id=2791376;
PARTS
------------------------------------------------------------------------------------------------------------------------
postcranial skeleton; skull; skin; blood (EDTA); blood (EDTA); blood (EDTA); blood (EDTA); blood (EDTA); kidney (frozen)
; blood (EDTA); blood (EDTA); blood (EDTA); blood (EDTA); blood (frozen); blood (frozen); blood (frozen); blood serum (f
rozen); blood serum (frozen); muscle (frozen); heart (frozen); blood (EDTA); blood (EDTA); liver (frozen); blood (EDTA)
In this case one "parts" value will be short (2 parts) and the other long (23 parts).
For GGBN, which...
GGBN expects to have one MaterialSample per record in its Darwin Core Occurrence archives
each part will define an "Occurrence" (which won't be an Occurrence at all, but we're stuck with the vocabulary).
select
specimen_part.COLLECTION_OBJECT_ID,
nvl(
specimen_event_links.specimen_event_id,
getPrioritySpecimenEvent(specimen_part.derived_from_cat_item)
) pieceOfTheOccurrenceID
from
specimen_part,
specimen_event_links
where
specimen_part.COLLECTION_OBJECT_ID=specimen_event_links.part_ID (+) and
specimen_part.derived_from_cat_item=2791376
;
COLLECTION_OBJECT_ID PIECEOFTHEOCCURRENCEID
-------------------- ----------------------
26094176 3232421
26094250 3232421
26094124 3232421
26094334 3232421
21967548 54243
2791380 54243
26094514 3232421
21967547 3232421
26094000 3232421
26093729 3232421
26093858 3232421
26094559 3232421
26094387 3232421
26094459 3232421
2791381 3232421
25901180 3232421
2791379 3232421
26093938 3232421
2791378 3232421
2791377 3232421
26093786 3232421
26094750 3232421
26094693 3232421
26094598 3232421
26094641 3232421
25 rows selected.
so that specimen would have 25 "Occurrences" each with one part until GGBN can support one Occurrence having multiple MaterialSample "children." (Or I suppose we could send out a single part - "kidney (frozen) + liver (frozen) + ..." - or something, but that would be contrary to GGBN's part-centric outlook. In any case SOMETHING has to be denormalized to support their model.)
This sounds like it will do what we are looking for, e.g. assigning parts to events as "occurrences" for GGBN, but it is hard to know until we see it go into effect and see what GGBN (and iDigBio and GBIF) do with the information.
Once this goes into effect, can you send a list of all multi-event cataloged items that have one or more unlinked part, by collection? Then we can start either designing scripts to clean those up or tackling them one by one.
On Mon, Aug 27, 2018 at 12:36 PM, dustymc notifications@github.com wrote:
I still think this needs to be more sequential - does it do whatever you're trying to do with the part attributes and remarks and accn and whatever else?
Assuming that answer is yes....
If that all works, linked parts will (eventually, somehow) go out with their "occurrence." Unlinked parts will, lacking better ideas, go out with the "priority" specimen event - https://github.com/ArctosDB/ DDL/blob/master/functions/getPrioritySpecimenEvent.sql. http://arctos-test.tacc.utexas.edu/guid/MSB:Mamm:157068 would go out (to GBIF-and-such) as two Occurrences:
These are the asserted links:
UAM@ARCTOSTE> select specimen_event_id, part_id from specimen_event_links where collection_object_id=2791376;
SPECIMEN_EVENT_ID PART_ID
3232421 26094176 3232421 26094250 3232421 26094124 3232421 26094334 54243 21967548 54243 2791380
and unlinked parts...
UAM@ARCTOSTE> select specimen_part.COLLECTION_OBJECT_ID from specimen_part where derived_from_cat_item=2791376 and COLLECTION_OBJECT_ID not in (select part_id from specimen_event_links where COLLECTION_OBJECT_ID=2791376) ;
COLLECTION_OBJECT_ID
26094514 21967547 26094000 26093729 26093858 26094559 26094387 26094459 2791381 25901180 2791379 26093938 2791378 2791377 26093786 26094750 26094693 26094598 26094641
19 rows selected.
would get lumped in with the "priority" event, which is
UAM@ARCTOSTE> select getPrioritySpecimenEvent(2791376) from dual;
GETPRIORITYSPECIMENEVENT(2791376)
3232421
I think that'll work the same way it does now, where parts are smooshed into a string.
UAM@ARCTOSTE> select parts from flat where collection_object_id=2791376;
PARTS
postcranial skeleton; skull; skin; blood (EDTA); blood (EDTA); blood (EDTA); blood (EDTA); blood (EDTA); kidney (frozen) ; blood (EDTA); blood (EDTA); blood (EDTA); blood (EDTA); blood (frozen); blood (frozen); blood (frozen); blood serum (f rozen); blood serum (frozen); muscle (frozen); heart (frozen); blood (EDTA); blood (EDTA); liver (frozen); blood (EDTA)
In this case one "parts" value will be short (2 parts) and the other long (23 parts).
For GGBN, which...
GGBN expects to have one MaterialSample per record in its Darwin Core Occurrence archives
each part will define an "Occurrence" (which won't be an Occurrence at all, but we're stuck with the vocabulary).
select specimen_part.COLLECTION_OBJECT_ID, nvl( specimen_event_links.specimen_event_id, getPrioritySpecimenEvent(specimen_part.derived_from_cat_item) ) pieceOfTheOccurrenceID from specimen_part, specimen_event_links where specimen_part.COLLECTION_OBJECT_ID=specimen_event_links.part_ID (+) and specimen_part.derived_from_cat_item=2791376 ; COLLECTION_OBJECT_ID PIECEOFTHEOCCURRENCEID
26094176 3232421 26094250 3232421 26094124 3232421 26094334 3232421 21967548 54243 2791380 54243 26094514 3232421 21967547 3232421 26094000 3232421 26093729 3232421 26093858 3232421 26094559 3232421 26094387 3232421 26094459 3232421 2791381 3232421 25901180 3232421 2791379 3232421 26093938 3232421 2791378 3232421 2791377 3232421 26093786 3232421 26094750 3232421 26094693 3232421 26094598 3232421 26094641 3232421
25 rows selected.
so that specimen would have 25 "Occurrences" each with one part until GGBN can support one Occurrence having multiple MaterialSample "children." (Or I suppose we could send out a single part - "kidney (frozen) + liver (frozen)
- ..." - or something, but that would be contrary to GGBN's part-centric outlook. In any case SOMETHING has to be denormalized to support their model.)
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1545#issuecomment-416324688, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hK5B0vQ8w_JTwaBDxR2JFagODZb9ks5uVDw1gaJpZM4UT5xn .
send a list
create table temp_multi_evt as select collection_object_id from specimen_event where verificationstatus!='unaccepted' having count(*) > 1 group by collection_object_id;
alter table temp_multi_evt add num_parts number;
update temp_multi_evt set num_parts=(select count(*) from specimen_part where specimen_part.derived_from_cat_item=temp_multi_evt.collection_object_id);
alter table temp_multi_evt add num_linked_parts number;
update temp_multi_evt set num_linked_parts=(select count(*) from specimen_event_links where specimen_event_links.collection_object_id=temp_multi_evt.collection_object_id);
alter table temp_multi_evt add guid varchar2(255);
update temp_multi_evt set guid=(select guid from flat where flat.collection_object_id=temp_multi_evt.collection_object_id);
-- get rid of some stuff we know
delete from temp_multi_evt where guid like 'UAM:EH%';
delete from temp_multi_evt where guid like 'UAMb:Herb:%';
select substr(guid,1,instr(guid,':',1,2)) || ' @ ' || count(*) from temp_multi_evt group by substr(guid,1,instr(guid,':',1,2));
SUBSTR(GUID,1,INSTR(GUID,':',1,2))||'@'||COUNT(*)
------------------------------------------------------------------------------------------------------------------------
MSB:Mamm: @ 6244
UTEP:Herb: @ 123
CHAS:Bird: @ 11024
DMNS:Inv: @ 7
UTEP:Herp: @ 1318
UWBM:Herp: @ 21
MVZ:Mamm: @ 5
UAMObs:Ento: @ 1
KWP:Ento: @ 3
UAM:Ento: @ 2
UCM:Fish: @ 1279
UAM:Mamm: @ 1
MSB:Bird: @ 2
MVZ:Bird: @ 5
CHAS:Egg: @ 474
MVZ:Herp: @ 1
UTEP:HerpOS: @ 139
DMNS:Bird: @ 8
CHAS:Mamm: @ 1
DMNS:Mamm: @ 4
UTEP:Inv: @ 1121
scripts
I think I've recovered what I can from date in part remarks. Let me know (examples are useful) what else might be used as a link and I'll script what I can.
iDigBio and GBIF
MAYBE we'll eventually send them slightly more accurate part concatenations, but nothing else will change.
Nothing about the format of what we're sending to GGBN will change, the event data will just be a bit better-targeted for those parts which have explicit links.
NK number should also be a link between part remarks or part label and the specimen event remarks for many of these.
On Tue, Aug 28, 2018 at 8:41 AM, dustymc notifications@github.com wrote:
send a list
create table temp_multi_evt as select collection_object_id from specimen_event where verificationstatus!='unaccepted' having count() > 1 group by collection_object_id; alter table temp_multi_evt add num_parts number; update temp_multi_evt set num_parts=(select count() from specimen_part where specimen_part.derived_from_cat_item=temp_multi_evt.collection_object_id); alter table temp_multi_evt add num_linked_parts number; update temp_multi_evt set num_linked_parts=(select count() from specimen_event_links where specimen_event_links.collection_object_id=temp_multi_evt.collection_object_id); alter table temp_multi_evt add guid varchar2(255); update temp_multi_evt set guid=(select guid from flat where flat.collection_object_id=temp_multi_evt.collection_object_id); -- get rid of some stuff we know delete from temp_multi_evt where guid like 'UAM:EH%'; delete from temp_multi_evt where guid like 'UAMb:Herb:%'; select substr(guid,1,instr(guid,':',1,2)) || ' @ ' || count() from temp_multi_evt group by substr(guid,1,instr(guid,':',1,2)); SUBSTR(GUID,1,INSTR(GUID,':',1,2))||'@'||COUNT(*)
MSB:Mamm: @ 6244 UTEP:Herb: @ 123 CHAS:Bird: @ 11024 DMNS:Inv: @ 7 UTEP:Herp: @ 1318 UWBM:Herp: @ 21 MVZ:Mamm: @ 5 UAMObs:Ento: @ 1 KWP:Ento: @ 3 UAM:Ento: @ 2 UCM:Fish: @ 1279 UAM:Mamm: @ 1 MSB:Bird: @ 2 MVZ:Bird: @ 5 CHAS:Egg: @ 474 MVZ:Herp: @ 1 UTEP:HerpOS: @ 139 DMNS:Bird: @ 8 CHAS:Mamm: @ 1 DMNS:Mamm: @ 4 UTEP:Inv: @ 1121
temp_multi_evt.csv.zip https://github.com/ArctosDB/arctos/files/2328479/temp_multi_evt.csv.zip
scripts
I think I've recovered what I can from date in part remarks. Let me know (examples are useful) what else might be used as a link and I'll script what I can.
iDigBio and GBIF
MAYBE we'll eventually send them slightly more accurate part concatenations, but nothing else will change.
Nothing about the format of what we're sending to GGBN will change, the event data will just be a bit better-targeted for those parts which have explicit links.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1545#issuecomment-416611124, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hOEO7vDfuHCosDvAGHYhcDr9v_jbks5uVVasgaJpZM4UT5xn .
NK
Example?
There is a new table digir_query.msb_mamm_ggbn_tissue_tbl containing MSB:Mamm tissue not-really-Occurrences for GGBN.
DDL is https://github.com/ArctosDB/DDL/blob/master/flat/ggbn_flat_tissue.sql
UAM@ARCTOS> select count(*) from digir_query.msb_mamm_ggbn_tissue_tbl;
COUNT(*)
----------
482101
1 row selected.
Elapsed: 00:00:00.79
UAM@ARCTOS> select occurrenceID from digir_query.msb_mamm_ggbn_tissue_tbl having count(*) > 1 group by occurrenceID;
no rows selected
Elapsed: 00:00:01.07
UAM@ARCTOS> select occurrenceID from digir_query.msb_mamm_ggbn_tissue_tbl where rownum<5;
OCCURRENCEID
------------------------------------------------------------------------------------------------------------------------
http://arctos.database.museum/guid/MSB:Mamm:69674?pid=27256939
http://arctos.database.museum/guid/MSB:Mamm:196496?pid=21375444
http://arctos.database.museum/guid/MSB:Mamm:85721?pid=2185666
http://arctos.database.museum/guid/MSB:Mamm:53231?pid=2140369
4 rows selected.
UAM@ARCTOS> select occurrenceID,preparationType,eventDate from digir_query.msb_mamm_ggbn_tissue_tbl where references='http://arctos.database.museum/guid/MSB:Mamm:292063' order by eventDate
2 ;
OCCURRENCEID
------------------------------------------------------------------------------------------------------------------------
PREPARATIONTYPE
------------------------------------------------------------------------------------------------------------------------
EVENTDATE
------------------------------------------------------------------------------------------------------------------------
http://arctos.database.museum/guid/MSB:Mamm:292063?pid=25994537
blood (EDTA)
2014-06-15
http://arctos.database.museum/guid/MSB:Mamm:292063?pid=25994538
blood serum (frozen)
2014-06-15
http://arctos.database.museum/guid/MSB:Mamm:292063?pid=25994545
blood (EDTA)
2014-06-30
http://arctos.database.museum/guid/MSB:Mamm:292063?pid=25994541
blood (EDTA)
2014-06-30
http://arctos.database.museum/guid/MSB:Mamm:292063?pid=25994542
blood (EDTA)
2014-06-30
http://arctos.database.museum/guid/MSB:Mamm:292063?pid=27923980
blood (EDTA)
2014-06-30
http://arctos.database.museum/guid/MSB:Mamm:292063?pid=25994546
blood serum (frozen)
2014-06-30
http://arctos.database.museum/guid/MSB:Mamm:292063?pid=25994544
blood (EDTA)
2014-06-30
http://arctos.database.museum/guid/MSB:Mamm:292063?pid=25994543
blood (EDTA)
2014-06-30
http://arctos.database.museum/guid/MSB:Mamm:292063?pid=25994547
blood (EDTA)
2014-07-21
http://arctos.database.museum/guid/MSB:Mamm:292063?pid=25994550
blood (EDTA)
2014-07-21
http://arctos.database.museum/guid/MSB:Mamm:292063?pid=25994549
blood (EDTA)
2014-07-21
http://arctos.database.museum/guid/MSB:Mamm:292063?pid=25994551
blood (EDTA)
2014-07-21
http://arctos.database.museum/guid/MSB:Mamm:292063?pid=25994548
blood (EDTA)
2014-07-21
http://arctos.database.museum/guid/MSB:Mamm:292063?pid=25994552
blood serum (frozen)
2014-07-21
http://arctos.database.museum/guid/MSB:Mamm:292063?pid=25988964
blood serum (frozen)
2014-10-19
http://arctos.database.museum/guid/MSB:Mamm:292063?pid=26966404
blood (EDTA)
2014-10-19
http://arctos.database.museum/guid/MSB:Mamm:292063?pid=25988963
blood (EDTA)
2014-10-19
http://arctos.database.museum/guid/MSB:Mamm:292063?pid=25988965
blood serum (frozen)
2014-10-19
19 rows selected.
I found a bunch more by NK, but I also found a bunch with conflicting data, attached.
PID: partID BARCODE: what it sounds like PART_REMARK: what it sounds like EID1: linked event (from dates) EID2: event that NK suggests should be linked BDn: began_date from event SE_REMARKn: event remarks
Thanks, I'll take a look at those.
On Wed, Aug 29, 2018 at 10:53 AM, dustymc notifications@github.com wrote:
I found a bunch more by NK, but I also found a bunch with conflicting data, attached.
PID: partID BARCODE: what it sounds like PART_REMARK: what it sounds like EID1: linked event (from dates) EID2: event that NK suggests should be linked BDn: began_date from event SE_REMARKn: event remarks
temp_multi_link.csv.zip https://github.com/ArctosDB/arctos/files/2332996/temp_multi_link.csv.zip
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1545#issuecomment-417025508, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hE4xY77UiKIfYl2RR45Guj1ntQ6Gks5uVscQgaJpZM4UT5xn .
Can this be closed?
Maybe let me make a GGBN resource using that view first before closing?
Yes, agree. We also need to see what iDigBio and GBIF do with it.
On Thu, Sep 6, 2018 at 2:12 PM, John Wieczorek notifications@github.com wrote:
Maybe let me make a GGBN resource using that view first before closing?
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1545#issuecomment-419225486, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hHilnqbHU2U5ps51vsnkT3qWPaibks5uYYGYgaJpZM4UT5xn .
They shouldn't be doing anything with it, should they?
On Fri, Sep 7, 2018 at 8:13 AM Mariel Campbell notifications@github.com wrote:
Yes, agree. We also need to see what iDigBio and GBIF do with it.
On Thu, Sep 6, 2018 at 2:12 PM, John Wieczorek notifications@github.com wrote:
Maybe let me make a GGBN resource using that view first before closing?
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1545#issuecomment-419225486, or mute the thread < https://github.com/notifications/unsubscribe-auth/AOH0hHilnqbHU2U5ps51vsnkT3qWPaibks5uYYGYgaJpZM4UT5xn
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1545#issuecomment-419225750, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcP63kZiaYvm5p4WbMKB7dsBqd7rgB4ks5uYYHRgaJpZM4UT5xn .
It shouldn't make a difference, right? Just want to confirm that. But the main thing is to see how it works for GGBN.
On Thu, Sep 6, 2018 at 2:14 PM, John Wieczorek notifications@github.com wrote:
They shouldn't be doing anything with it, should they?
On Fri, Sep 7, 2018 at 8:13 AM Mariel Campbell notifications@github.com wrote:
Yes, agree. We also need to see what iDigBio and GBIF do with it.
On Thu, Sep 6, 2018 at 2:12 PM, John Wieczorek <notifications@github.com
wrote:
Maybe let me make a GGBN resource using that view first before closing?
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <https://github.com/ArctosDB/arctos/issues/1545#issuecomment-419225486 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ AOH0hHilnqbHU2U5ps51vsnkT3qWPaibks5uYYGYgaJpZM4UT5xn
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1545#issuecomment-419225750, or mute the thread https://github.com/notifications/unsubscribe-auth/ AAcP63kZiaYvm5p4WbMKB7dsBqd7rgB4ks5uYYHRgaJpZM4UT5xn .
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1545#issuecomment-419226142, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hN8TVsaAGLCkBKiCV2tNHGjgV9tYks5uYYIhgaJpZM4UT5xn .
Wilco.
This is for GGBN and only GGBN. It's vastly different than what we're sending to everyone else.
The process of linking parts to specimen events is working, but it is fairly complicated and needs some help to be more functional and user friendly. Specifically, in the specimen record, we need the specimen events to display in order by date, and we need the parts to display in the same order, at least within part type. In the current system, events seem to be in order of when they were created (?), which for this type of legacy data where multiple records are being consolidated, results in random order by date. Then, linking to parts that also appear to end up, within part type, in random order, becomes very difficult. Can we have events and parts present in some consistent ordering scheme? I would prefer most recent event first, with the associated parts in the same order. This would make the process of finding and linking associated parts and events much easier and more intelligible. As a corollary to this, identifiers also appear to get added in some random order. For these multiple events, parts from each event have an shared unique identifier (NK). Currently, these display in somewhat random order - it would be helpful to have them display in numeric order, smallest to largest, in the identifiers box of the specimen record. This also applies to all Arctos pages and displays in general, including object tracking. How exactly does Oracle decide to order things, and can we choose to have a consistent order throughout?
To make things event more interesting, in turns out that specimens display in a different order when you click "edit" than they do on the main display page. This is problematic for entering any edits to the correct event.
@campmlc it's 2 clicks please elaborate on "complicated."
There is no date associated with parts. I can't sort by things that don't exist.
Events are sorted by a complicated thing that considers verificationstatus and type and such. I'm not sure if there's enough information to replicate that in the pick, but I'll check.
Sorting identifiers could be an Issue, but likely needs linguistic indexes.
Oracle does not order things. We can request sorting anywhere, but it's not free. That is also a new Issue.
Can this be closed?
We should wait to close this until the GGBN resource is published. John, what the status on that?
And yes, disregard the GBIF question. This is a separate resource.
On Wed, Feb 13, 2019, 8:41 AM dustymc <notifications@github.com wrote:
@campmlc https://github.com/campmlc it's 2 clicks please elaborate on "complicated."
There is no date associated with parts. I can't sort by things that don't exist.
Events are sorted by a complicated thing that considers verificationstatus and type and such. I'm not sure if there's enough information to replicate that in the pick, but I'll check.
Sorting identifiers could be an Issue, but likely needs linguistic indexes.
Oracle does not order things. We can request sorting anywhere, but it's not free. That is also a new Issue.
Can this be closed?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1545#issuecomment-463247516, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hByIkuUlhhnLhPH0mh3pFMds51tRks5vNDIhgaJpZM4UT5xn .
I have a test resource set up, but I am not getting the data I need out of Arctos with the query "select * from digir_query.msb_mamm_ggbn_tissue_tbl". I'll work with Dusty to get that straight. The first GGBN resource will be for MSB Mammals that have one tissue per specimen. Let's get that right and see what we can convince GGBN of after that. Hopefully we can have a single query over all Arctos for GGBN, and distinguish resources by their collection_id in the resource SELECT statement.
One big question is what we want to do about metadata for these GGBN-only resources. Should we copy the metadata from the regular collection's metadata and modify that in some way?
On Wed, Feb 13, 2019 at 12:59 PM Mariel Campbell notifications@github.com wrote:
We should wait to close this until the GGBN resource is published. John, what the status on that?
And yes, disregard the GBIF question. This is a separate resource.
On Wed, Feb 13, 2019, 8:41 AM dustymc <notifications@github.com wrote:
@campmlc https://github.com/campmlc it's 2 clicks please elaborate on "complicated."
There is no date associated with parts. I can't sort by things that don't exist.
Events are sorted by a complicated thing that considers verificationstatus and type and such. I'm not sure if there's enough information to replicate that in the pick, but I'll check.
Sorting identifiers could be an Issue, but likely needs linguistic indexes.
Oracle does not order things. We can request sorting anywhere, but it's not free. That is also a new Issue.
Can this be closed?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1545#issuecomment-463247516, or mute the thread < https://github.com/notifications/unsubscribe-auth/AOH0hByIkuUlhhnLhPH0mh3pFMds51tRks5vNDIhgaJpZM4UT5xn
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1545#issuecomment-463255137, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcP6x69RgAgwSzu6XW7cd6HXIvnvCwZks5vNDZ3gaJpZM4UT5xn .
Test resource now ready for testing at http://ipt.vertnet.org:8080/ipt/resource.do?r=msbmammalggbntest
Thank you, John! What do we need to do to test this?
On Wed, Mar 13, 2019 at 7:38 AM John Wieczorek notifications@github.com wrote:
Test resource now ready for testing at http://ipt.vertnet.org:8080/ipt/resource.do?r=msbmammalggbntest
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1545#issuecomment-472425287, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hG8izqBJ31yKsZ7Prq0uW1DTtJ7wks5vWP9egaJpZM4UT5xn .
Gabi is on this now.
On Wed, Mar 13, 2019 at 12:16 PM Mariel Campbell notifications@github.com wrote:
Thank you, John! What do we need to do to test this?
On Wed, Mar 13, 2019 at 7:38 AM John Wieczorek notifications@github.com wrote:
Test resource now ready for testing at http://ipt.vertnet.org:8080/ipt/resource.do?r=msbmammalggbntest
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1545#issuecomment-472425287, or mute the thread < https://github.com/notifications/unsubscribe-auth/AOH0hG8izqBJ31yKsZ7Prq0uW1DTtJ7wks5vWP9egaJpZM4UT5xn
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1545#issuecomment-472466292, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcP61Jzx54xEzglMbrkex8k7pXkYCPJks5vWRY-gaJpZM4UT5xn .
Can this be closed?
Yes, this can be closed.
On Tue, Apr 2, 2019 at 12:38 PM dustymc notifications@github.com wrote:
Can this be closed?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1545#issuecomment-479056769, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcP6z8-chU-CZkXWkkttfXc8oYbHceAks5vc3l8gaJpZM4UT5xn .
Per conversation with John Wieczorek, we need to seek funding to implement changes to the event model so that parts = material samples are linked to events = occurrences. This would resolve a deficiency in our current data model and would alleviate the problems Arctos has with serving data to GGBN. Perhaps this can be accomplished with GGBN funding.