Closed timrobertson100 closed 1 year ago
Hmm, not sure what those rows at the bottom were. I have removed them.
File has been replaced
Thanks @acbentley but the IDs are in the occurrence file as well so without them the occurrence file becomes invalid.
If those are to be removed, they need to be removed from occurrence and anything linking to those occurrences would need to be removed.
The IDs appear to be duplicated in the data above though so should still link correctly.
I'm afraid not - it removes IDs that are used in the occurrence table. The error provides an example key:
psql:load-specify.sql:27: ERROR: insert or update on table "occurrence" violates foreign key constraint "occurrence_occurrence_id_fkey"
DETAIL: Key (occurrence_id)=(bb2deddf-98cb-a120-1451-66e1e4a20878) is not present in table "event".
OK, I have a note indicating that events needed to be added for all occurrences from the occurrence file. I have added these back in and transformed them. Does that work now?
I'm afraid not. Now there are duplicate records in Event.csv
psql:load-specify.sql:21: ERROR: duplicate key value violates unique constraint "event_pkey"
DETAIL: Key (event_id)=(b3cf407a-29af-44f2-af3c-f9a9d3fbe6c3) already exists.
Looking at the file, that key is in many times (with different location keys):
tsj442@YN13987 gbif % cat ../specify/Event/EVENT.csv|grep b3cf407a-29af-44f2-af3c-f9a9d3fbe6c3
b3cf407a-29af-44f2-af3c-f9a9d3fbe6c3,,https://www.gbif.org/dataset/8f79c802-a58c-447f-99aa-1d6a0790825a,a17a1150-49ac-4d8c-9466-fa61aa3cffe4,,OCCURRENCE,,001-RFM-04,5/11/2004,2004,5,11,,,,,,39:31.18 N,96:19.58 W,,,,,,,,,
b3cf407a-29af-44f2-af3c-f9a9d3fbe6c3,,https://www.gbif.org/dataset/8f79c802-a58c-447f-99aa-1d6a0790825a,bb2deddf-98cb-a120-1451-66e1e4a20878,,OCCURRENCE,,,,,,,,,,,,,,,,,,,,,,
b3cf407a-29af-44f2-af3c-f9a9d3fbe6c3,,https://www.gbif.org/dataset/8f79c802-a58c-447f-99aa-1d6a0790825a,0774f440-2bbe-c638-dfe0-cb843297e8ab,,OCCURRENCE,,,,,,,,,,,,,,,,,,,,,,
b3cf407a-29af-44f2-af3c-f9a9d3fbe6c3,,https://www.gbif.org/dataset/8f79c802-a58c-447f-99aa-1d6a0790825a,e1876adb-12bf-a52a-9e92-661b9423ebae,,OCCURRENCE,,,,,,,,,,,,,,,,,,,,,,
b3cf407a-29af-44f2-af3c-f9a9d3fbe6c3,,https://www.gbif.org/dataset/8f79c802-a58c-447f-99aa-1d6a0790825a,5b19f908-d5f6-8324-a6ae-19de53aed9e2,,OCCURRENCE,,,,,,,,,,,,,,,,,,,,,,
b3cf407a-29af-44f2-af3c-f9a9d3fbe6c3,,https://www.gbif.org/dataset/8f79c802-a58c-447f-99aa-1d6a0790825a,ca688bea-d19c-62e1-59e3-8afb97426473,,OCCURRENCE,,,,,,,,,,,,,,,,,,,,,,
b3cf407a-29af-44f2-af3c-f9a9d3fbe6c3,,https://www.gbif.org/dataset/8f79c802-a58c-447f-99aa-1d6a0790825a,e5b8e92c-f2a9-69c7-3d9c-15e71173b156,,OCCURRENCE,,,,,,,,,,,,,,,,,,,,,,
b3cf407a-29af-44f2-af3c-f9a9d3fbe6c3,,https://www.gbif.org/dataset/8f79c802-a58c-447f-99aa-1d6a0790825a,dc0e3767-f15d-61e6-15b1-0c02401a7353,,OCCURRENCE,,,,,,,,,,,,,,,,,,,,,,
b3cf407a-29af-44f2-af3c-f9a9d3fbe6c3,,https://www.gbif.org/dataset/8f79c802-a58c-447f-99aa-1d6a0790825a,0e455b50-bdd7-a756-4fd3-7f3a00ef0057,,OCCURRENCE,,,,,,,,,,,,,,,,,,,,,,
b3cf407a-29af-44f2-af3c-f9a9d3fbe6c3,,https://www.gbif.org/dataset/8f79c802-a58c-447f-99aa-1d6a0790825a,ccd81ea8-19a3-c6a2-2cd2-7ffb92154353,,OCCURRENCE,,,,,,,,,,,,,,,,,,,,,,
b3cf407a-29af-44f2-af3c-f9a9d3fbe6c3,,https://www.gbif.org/dataset/8f79c802-a58c-447f-99aa-1d6a0790825a,f92b5c1c-2d7e-cd38-533d-732f0be010bd,,OCCURRENCE,,,,,,,,,,,,,,,,,,,,,,
b3cf407a-29af-44f2-af3c-f9a9d3fbe6c3,,https://www.gbif.org/dataset/8f79c802-a58c-447f-99aa-1d6a0790825a,3b54566a-0a3a-712a-8cd8-a77174ab1496,,OCCURRENCE,,,,,,,,,,,,,,,,,,,,,,
b3cf407a-29af-44f2-af3c-f9a9d3fbe6c3,,https://www.gbif.org/dataset/8f79c802-a58c-447f-99aa-1d6a0790825a,ad4b891a-2b92-2ff8-5c67-a4c2cedeed03,,OCCURRENCE,,,,,,,,,,,,,,,,,,,,,,
Yes, because the event is connected to multiple localities in a 1:M relationship in Specify.
I am seriously confused now. Take a look at the occurrence file and let me know how to add occurrences to the event file, please. What does the occurrence_id equate to and am I supposed to create event_ids?
So an Occurrence in this model is a subtype of Event (i.e. an inheritance model, similar to the Entity and MaterialEntity). What this means is that for every Occurrence record, there must be an equivalent Event record having the same ID.
because the event is connected to multiple localities in a 1:M relationship in Specify
I don't think this model will accommodate that. An Event is something that happens within a place and time. I don't think an event can span multiple locations.
Sorry, the other way around. The same locality is connected to multiple events - the same place is collected multiple times potentially by multiple people. So, the occurrence_id becomes the event_id in this scenario? Do I then need to connect these to the locality_ids? Is this still an event_type = OCCURRENCE?
Thanks @acbentley - that makes more sense.
So if I understand correctly, you'd have:
Location
records for where things were observed or collectedEvent
records each with their own ID, time information (when the event occurred) and by whom - there can be many events at the same Location
(same locationID
)Organism
(s) at a place and time so they will have eventType
of Occurrence
Occurrence
entry with the same ID as the Event
(required for the inheritance model)Occurrence
events where physical material was collected - not just observed - there would be an entry in the OccurrenceEvidence
that joins the Occurrence
to the Entity
that was preserved.I hope this makes some sense, but please do ask if not.
Somewhat:
Yes, I have a location file that has my locality table IDs
Yes, I have an event file that links Collection event (event_id) to location (locality_id)
Correct. The occurrence file links an event_id to an organism_id
This is where I am still a little lost. How do I map the IDs in the occurrence file to those in the event file in order to add them? Does the occurrence_id (which I had to create) become the event_id? What then is the location_id and what event_type is it?
Sorry for so many dumb questions but I think I am having issues resolving our specimen-based model to the UM event-based model.
I want to leave out the OccurrenceEvidence from this response. I'll get to that in another comment.
I think the issue with the Events is actually simple:
Sorry for so many dumb questions but I think I am having issues resolving our specimen-based model to the UM event-based model.
No worries. I think you had it mostly right in a previous export.
- For the Occurrence events where physical material was collected - not just observed - there would be an entry in the OccurrenceEvidence that joins the Occurrence to the Entity that was preserved.
This is where I am still a little lost. How do I map the IDs in the occurrence file to those in the event file in order to add them? Does the occurrence_id (which I had to create) become the event_id? What then is the location_id and what event_type is it?
The OccurrenceEvidence table is meant to allow you to say what evidence there is to assert the Occurrence. Since it links Occurrence to Entity, the evidence can be based on MaterialEntities and/or on DigitalEntities.
At least in this dataset, there are no parts of collection objects that are currently shared as MaterialEntities/Entities derived from the Organisms. This could happen, by a) integrating the tissue samples, and b) parsing the preparations. Of these tow, a) would be the most interesting to actually capture and demonstrate. Since there are (currently) no MaterialEntities other than the Organisms in this data set, the OccurrenceEvidence consists only of Organisms and the entries in the OccurrenceEvidence would consist of same occurrenceID, organismID combinations as are in the Occurrence table. Though this is redundant in terms of information, it makes it easier to populate an "Evidence" section of a UI by using the OccurrenceEvidence table directly instead of having to traverse multiple tables looking for things that may or may not be there.
I want to leave out the OccurrenceEvidence from this response. I'll get to that in another comment.
I think the issue with the Events is actually simple:
- Each Specify collecting event is a GUM Event. These look present and correct in EVENT.csv.
- Each Specify collection object is a GUM Organism/MaterialEntity/Entity. These look correct in the three corresponding csv files.
- For every Specify collection object there is a GUM Occurrence/Event with manufactured occurrenceIDs, because those don't exist in Specify. These look correct in the OCCURRENCE.csv table.
- The Event records for the Occurrences in the OCCURRENCE.csv file in the EVENT.csv file, but they are defective. They were actually more correct in the previous version of the EVENT.csv file. Every Occurrence record requires an Event record with the eventID=occurrenceID. I believe that right now you have all of the records, but you put the Specify collecting event ID in the eventID field for them instead of the occurrenceID. The values you put in the eventID for the Events that are Occurrences should go into the parentEventID instead of the eventID. That will make your collecting events the parents of the Occurrences that happened there. The locationIDs, verbatim fields, etc. for the Event records that are Occurrences would just be the same those of their parents Events. The eventType for the Event records that are Occurrences must be 'OCCURRENCE'. The eventType for the Event records that are collecting events in Specify should be "COLLECTING_EVENT" rather than "OCCURRENCE", which is what they have right now.
So, I need to include location_id's for all of these? Seems a little redundant? That will involve a separate query out of Specify as I don't currently have a query that links all these things together.
It's redundant in much the same way that OccurrenceEvidence is in your case, but it makes things easier on the visualization and data assembly sides of things.
My problem is that it is harder to link those since the occurence_ids have been manufactured. I will have to re-manufacture them if I need to create a new query that links collecting events to localities.
OK, have regenerated and uploaded the OCCURRENCE file and EVENT file hopefully with the necessary connections.
Thanks @acbentley - I'm afraid this is still problematic.
psql:load-specify.sql:27: ERROR: insert or update on table "event" violates foreign key constraint "event_parent_event_id_fkey"
DETAIL: Key (parent_event_id)=(db191bf1-1ed3-11e3-bfac-90b11c41863e) is not present in table "event".
There are rows in the event table with parentEventID
which don't exist.
OK, I think I found the problem. Have uploaded a new EVENT file
Thanks @acbentley
This is very close now. There has been a change today where a missing foreign key was spotted.
In your material_entity
table you have collection_id in the form of https://www.gbif.org/grscicoll/collection/e879eb7f-dda0-4c93-9b35-42e39010e8be
. That needs to be the key in the collection file.
Please can you change material_entity
collection_id to be 3f55b3fa-292d-4170-bd46-66dca41d7f05
and not ´https://www.gbif.org/grscicoll/collection/e879eb7f-dda0-4c93-9b35-42e39010e8be´?
Please can you change the grscicoll_id
column in collection
to be e879eb7f-dda0-4c93-9b35-42e39010e8be
and not https://www.gbif.org/grscicoll/collection/e879eb7f-dda0-4c93-9b35-42e39010e8be
With those 2 changes, your data loads into the shared database
New COLLECTION and MATERIAL_ENTITY files uploaded with those changes
Thanks @acbentley
The data loads now into the shared database. There may be additional things, but this is a really important milestone - thank you for all the edits.
The user interface is a work in progress, but you can see an example record here.
The latest EVENT.csv does not load due to a missing
eventType
From row 11,241 there seems to be missing data
Could you please look at this @acbentley?