AtlasOfLivingAustralia / data-management

Data management issue tracking
7 stars 0 forks source link

dr21629 - UC - Wildlife Genetics Database #848

Closed rosemaryjoconnor closed 2 months ago

rosemaryjoconnor commented 1 year ago

From Prof Arthur Georges:

We have a couple of databases (one for lizards in our breeding colony, but including a lot of wild-caught animals; one for freshwater turtles with about 36,000 samples). They are hosted on AWS in MariaDB. Both are living databases in that they are in continual use with new data entered continually. Almost all of the wild caught specimens have lat longs associated with them.

The idea would be to have some sort of automated system that mines our databases for specimen records with locality data etc and uploads those that have not already been uploaded to ALA via some sort of API (provided their Status is 'public' and their locality data and taxonomic identity has been verified).

I already have the database for Pogona linked to a suite of scripts in R, so it would be a matter of moving the data from R to ALA. I can handle the scripting at this end, if someone can handle it at the ALA end.

rosemaryjoconnor commented 1 year ago

Meeting today with Arthur. He will be sending sample exports for the 2 databases together with sample R scripts with database connections.

There is no API on their end but the databases are available for public access. Images are available and can be placed on a server for access. Directories of images are named with directory name matching sampleId, so there is some structure there to build image URLs

There is no real IT support capacity for the project. Arthur was hoping for monthly updates eventually, so likely a fetcher will be the go. Looking at the database the structure is not complex and any additional information we need he is happy to add.

rosemaryjoconnor commented 1 year ago

Data has been reviewed, emailed Arthur re some questions re fields and null dates etc.... Will have to have further discussion regarding images and location etc...

rosemaryjoconnor commented 1 year ago

Arthur is adding new field for identifiedBy to Turtle database. Happy with everything else in Test. Images may still be tricky. He has indicated that it is not the filename identified by the catalogNumber (specimenID), but the directory in which multiple images may reside. He wants all this automated so we have to work out how. I'm concerned that the catalogNumber and the directory name may not match so any building of URLs on the fly could be error-prone. Wondering if we should ask that he has another table in the databases with fields for catalogNumber and URL.

rosemaryjoconnor commented 1 year ago

Arthur Georges has responded and is updating additional columns in DB. Need to discuss process for managing images. Arthur has indicated they will be uploaded to an AWS server. Each specimenID has a folder with multiple images - some of these have quite large file size

rosemaryjoconnor commented 1 year ago

Waiting for updates from UC

rosemaryjoconnor commented 1 year ago

Arthur away until April 24th. Images planned to go on AWS webserver where his web page is managed -- georges.biomatix.org. There will be a directory for every specimen id, each containing multiple images

rosemaryjoconnor commented 11 months ago

Emailed 04/10/2023

rosemaryjoconnor commented 7 months ago

15/02/2204

rosemaryjoconnor commented 7 months ago

21/02/2024

To do:

rosemaryjoconnor commented 6 months ago

07/03/2024

rosemaryjoconnor commented 6 months ago

Not yet in Production - images to be added before it goes live

rosemaryjoconnor commented 5 months ago

27/02/2204 Prod dr25075- Data in production - no images

rosemaryjoconnor commented 5 months ago

16/04/2024

rosemaryjoconnor commented 5 months ago

17/04/2024

rosemaryjoconnor commented 5 months ago

03/05/2024

rosemaryjoconnor commented 3 months ago

01/07/2024