Open PipBrewer opened 4 months ago
We've now decided to put together a Python script for this instead of creating a new GREL script for OpenRefine. We just got our first data dump from AU yesterday. The next step will be to go through the data and decide which fields need to be imported into Specify.
Steps for script will be:
The following data needs to be pulled from the species-web db:
From table Specimen:
From table Folder Versions:
I've written a SQL query to pull this data and join it (currently a left join where folder versions is left). I need to add barcodes to this when we update the db.
Additional information that needs to be added:
projectnumber: DaSSCo publish: True storedunder: True preptypename: Sheet count: 1 collection: ??? datafile_remark: [name of db export]? Possibly by date? Is this useful? datafile_source: DaSSCo data file datafile_date: [date of digitization]
❓ Questions: At this point, there doesn't seem to be a need for fields like qualifier/addendum or remarks. (Accurate statement?) What is the collection? Will there be hybrids? (These would need to be handled slightly differently because gbif does not keep hybrids in their backbone, therefore the gbif_match_json will always be null for them.) Do we need flags for new taxonomy or would that be redundant since Birgitte is already checking everything in species-web? The digitiser always shows as Birgitte right now. But should it always be Charlotte instead?
After some additional data exploration, it looks like the fields: family, genus, species, etc do not always reflect the correct taxonomy. Therefore, it is safer to pull the taxonomy from the gbif_match_json field instead. From this field, relevant info we can pull:
The highest classification field is also unreliable, therefore useless. I'll just have to pull the data from the scientificName field. (But if there are hybrids, they will need to be handled differently as there won't be a scientificName field for them maybe?)
Arhus University Herbarium currently have data that is equivalent to NHMD's digi app exports (with some differences). This needs a GREL script writing to transform it so it can be imported into Specify