AtlasOfLivingAustralia / data-management

Data management issue tracking
7 stars 0 forks source link

Insect Investigators Data Load #839

Closed peggynewman closed 1 year ago

peggynewman commented 1 year ago

Create an ongoing data load for this project from BOLD. At the same time, look into other Australian datasets on BOLD. Check with Nick/ARGA how they are loading/processing.

webpage: https://insectinvestigators.com.au/

The BOLD project ID is ASMII – data is already public so can be accessed from the Workbench section of BOLD by searching for the project code, or without logging in you can find the records in the more public-facing site: https://www.boldsystems.org/index.php/Public_SearchTerms?searchMenu=records&query=ASMII&taxon=

There’s 14,060 specimens, and 13097 COI barcode sequences (but about 500 of those sequences are flagged as problematic records/contamination, so shouldn’t be used further).

peggynewman commented 1 year ago

Sent email.

cha801p commented 1 year ago

Loaded the data on Databox. Testing in progress.

javier-molina commented 1 year ago

@cha801p let's talk about the format of this dataset and what we already have from Bold: https://biocache.ala.org.au/occurrence/search?q=*%3A*&qualityProfile=ALA&fq=data_resource_uid%3A%22dr375%22

peggynewman commented 1 year ago

We could use the BOLD API: https://www.boldsystems.org/index.php/resources/api?type=webservices We want the specimen and sequence, plus images

peggynewman commented 1 year ago

Notebook for ARGA https://github.com/ARGA-Genomes/arga-data/blob/develop/jupyter/notebooks/iBold-file-import.ipynb Could be useful to update dr375

cha801p commented 1 year ago

Emailed data provider re data load: https://collections.ala.org.au/public/show/dr22303