AtlasOfLivingAustralia / appd-hub

Australian Plant Pest Database Hub
https://appd.ala.org.au
0 stars 2 forks source link

APPD Project - new data upload #1

Closed m-hope closed 9 years ago

m-hope commented 9 years ago

As part of the project to implement the APPD Hub, the existing data currently in the hub needs to be updated with a complete refresh in order to fix a number of issues which appear to be the result of incomplete field and taxa mapping.

Once the new full APPD data set is provided by PHA (contact: Steve Dibley), it is suggested that the data from each individual data provider is separated out and ingested separately.

This upload should be completed prior to the next project meeting at the end of January 2015.

m-hope commented 9 years ago

Advised by Steve Dibley (PHA) on 9/12/2014, that a request to get a copy of the full APPD database has been submitted, and we will be notified as soon as the copy is received.

M-Nicholls commented 9 years ago

data received as a full database backup, restored and doing the analysis

m-hope commented 9 years ago

Data has been extracted and cleaned ready for ingestion... will wait for hub to be updated to latest version first.

m-hope commented 9 years ago

New data has been uploaded after old data removed.

129954 duplicate catalogue numbers were found... this will be reported back to PHA.

m-hope commented 9 years ago

Change of plans... New data to be loaded as individual resources created as follows:

dr1988 - ANIC dr1989 - BRIP dr1990 - APDD dr1991 - ASCU dr1992 - BSES dr1993 - CCDB dr1994 - ICDB dr1995 - NTEIC dr1996 - DNAP dr1997 - FCNI dr1998 - TFIC dr1999 - TPPD dr2000 - WACALM dr2001 - UQIC dr2002 - VAIC dr2003 - WINC dr2004 - QDPC

m-hope commented 9 years ago

Old data in dr1124 removed.

m-hope commented 9 years ago

APPD-hub backend code updated to new version... includes fixes for record deletion and processing of state when layers not available, and includes indexing of datasetName field.

m-hope commented 9 years ago

Individual output files created for each dataset with field mappings as follows:

sourceDatabase - datasetName accessionNo - catalogNumber pestOrder - order pestFamily - family pestGenus - genus pestSpecies - specificEpithet pestInfraTaxa - infraSpecificEpithet pestAuthority - scientificNameAuthorship pestCommonName - vernacularName hostFamily - hostFamily hostGenus - hostGenus hostSpecies - hostSpecificEpithet hostInfraTaxa - hostInfraSpecificEpithet hostCommonName - hostVernacularName hostSubstrate - hostSubstrate locationTown - locality locationState - stateProvince locationCountry - country latitude - latitude longitude - longitude collectionMethod - samplingProtocol collectionDate - eventDate collectionDateYear - year collectorName - recordedBy specimenIdentifier - specimenIdentifier identificationMethod - identificationMethod symptom - symptom stage - lifeStage hostDamage - hostDamage qualityIndicator - qualityIndicator traits - traits identifiersName - identifiedBy identificationDate - dateIdentified identificationNotes - identificationRemarks interceptionClassification - interceptionClassification

m-hope commented 9 years ago

Issue with State an Country fields means that data will need to be resampled and processed once the APPD hub has been configured to access the appropriate layers. See issue #13.

m-hope commented 9 years ago

State issue now resolved.