BU-Spark / se-Symbiota-portal

The Symbiota Virtual Flora/Fauna project is an open source software project, with central goal of developing on-line tools that aid in the generation, exploration and management of biodiversity data (collection specimens, observations, images, checklist, keys, etc.). See also: http://bdj.pensoft.net/articles.php?id=1114 and http://symbiota.org/
GNU General Public License v2.0
1 stars 3 forks source link

Fix bugs in database caused by the merge to Symbiota version 3.09 #40

Closed Tian-Tan closed 1 year ago

Tian-Tan commented 1 year ago

Multiple bugs remain after running our 1.3 sql patch on top of the 3.0 schema, so a patch has to be made to be applied after the 1.3 patch to fix these bugs. For now the new patch is a completely new file called config\schema\3.0\patches\db_schema_patch-3-09-update-patch.sql but in the future it is best to combine this file with the 1.3 patch. For development purposes it is just much easier to separate these files for now.

Tian-Tan commented 1 year ago

The omoccurrences table is now synced with the sample data Harvard has provided, the import of the data into the database is also successful, with the following minor concerns:

  1. The symbiota model is slightly altered to match Harvard's sample data, should this be the case?
  2. The occid auto increment is messed up by the tests and requires a database drop to be solved.
  3. Not all 5000 rows of data might be imported.
Tian-Tan commented 1 year ago

These are the mappings from the Harvard sample data to Symbiota: id -> skipped (duplicate of occurrenceID column) type -> typeStatus (updated to varchar(50)) (similar meaning to DarwinCore 'typeStatus' but not the same format) license -> NEW varchar(50) (data is a harvard website link) rightsHolder -> NEW varchar(50) (The name of the respective owner, might be related to institutionID or institutionCode but this is text instead of ID) accessRights -> NEW varchar(128) (data is a harvard website link) references -> NEW varchar(64) (data is a harvard website link) datasetName -> NEW varchar(50) (data is the name of the herbaria, might be related to datasetID) occurrenceStatus -> processingStatus (updated to varchar(50) (similar meaning to DarwinCore processingStatus, uncertain) associatedMedia -> NEW varchar(50) (might have something to do with the associated images) higherGeography -> highGeo (updated to varchar(64)) verbatimLocality -> skipped (duplicate of locality column)

Tian-Tan commented 1 year ago

TODO: Update patch to ensure Symbiota's omoccurrences table reflect Harvard's sample data

Tian-Tan commented 1 year ago

At the same time, investigation needs to be done on the newly created columns to ensure that their creation is absolutely necessary and no existing columns (by DarwinCore standards or existing Symbiota columns)'s function conflict with the new columns.

Tian-Tan commented 1 year ago

At the same time, investigation needs to be done on the newly created columns to ensure that their creation is absolutely necessary and no existing columns (by DarwinCore standards or existing Symbiota columns)'s function conflict with the new columns.

Looks to not be a problem

Tian-Tan commented 1 year ago

Mapping for multimedia dataset from the Harvard sample data to symbiota images table (for columns that weren't automatically mapped): id -> occurrenceID (NEW varchar(50)) (probably should be used as FK to omoccurrences) type -> imageType (remain as varchar(50)) identifier -> sourceIdentifier (updated to varchar(128)) references -> referenceUrl (remain as varchar(255)) description -> caption (remain as varchar(100)) created -> skipped (Believed to be the timestamp of when the record was first created, so it is a duplication of Symbiota's InitialTimeStamp) creator -> NEW varchar(50) (No data available, unsure if it meant symbiota user or other 3rd party creator, so it's using a new column for now) license -> copyright (remain as varchar(255)) (link to copyright statement) rightsHolder -> owner (remain as varchar(250)) (believed to mean the same thing) datasetID -> recordID (updated to varchar(50) (believed to mean the same thing)

ttaliacc commented 1 year ago

@Tian-Tan Close it if it's done

Tian-Tan commented 1 year ago

Issue solved.