AtlasOfLivingAustralia / data-management

Data management issue tracking
7 stars 0 forks source link

update eBird (recurring) #523

Open M-Nicholls opened 4 years ago

ansell commented 4 years ago

The eBird API appears to be accessible using this Python package:

https://pypi.org/project/ebird-api/

Not sure if anyone at the ALA has registered with eBird in the past or received an API key. The notes there appear to say that big downloaders like we will be need to let them know first to avoid being banned for overuse.

ansell commented 4 years ago

Downloading from GBIF rather than eBird to see if that method can work https://www.gbif.org/occurrence/download/0020957-200613084148143

ansell commented 4 years ago

Downloading from the ebird API using python seems to only retrieve the last 30 days observations, which is unuseful to us in this context (could be useful in future for regular updates).

ansell commented 4 years ago

The identifier pattern changed, with a different prefix now, so having to replace the previous prefix in cassandra before running the load to avoid duplicate records and lost annotations.

Dumped the full occ.occ_uuid table from cassandra, filtered it to only have dr2009| rows, then ran awk to generate a script containing the Cassandra Delete and Insert commands. Will run the script tomorrow while there are no jobs running in Jenkins and it has a shutdown.

ans025@aws-cass-cluster-1b:/data/tmp/ans025$ cat awk-script 
#!/usr/bin/awk -f

BEGIN { 
  FS = ",";
#  OFS = ",";
  IGNORECASE = 1;
}

{
  gsub("\"", "", $1)
  gsub("\"", "", $2)
  $3=$1
  gsub("URN:CornellLabOfOrnithology:EBIRD", "URN:catalog:CLO:EBIRD", $3);
#  print $0
  print "DELETE FROM occ.occ_uuid WHERE rowkey='" $1 "';"
  print "INSERT INTO occ.occ_uuid(rowkey,value) values('" $3 "','" $2 "');"
}
ans025@aws-cass-cluster-1b:/data/tmp/ans025$ ./awk-script < uuids-dr2009.txt > processed-uuids-dr2009.txt 

Verification of the script before running it is done using the following commands. Note in particular that the capitalisation difference on the second line of the input file is not going to cause an issue because of the IGNORECASE = 1; in the aws script:

ans025@aws-cass-cluster-1b:~$ wc -l uuids-dr2009.txt
19342509 uuids-dr2009.txt
ans025@aws-cass-cluster-1b:/data/tmp/ans025$ wc -l processed-uuids-dr2009.txt 
38685018 processed-uuids-dr2009.txt
ans025@aws-cass-cluster-1b:/data/tmp/ans025$ head -n 3 uuids-dr2009.txt 
"dr2009|URN:CornellLabOfOrnithology:EBIRD:OBS597913474","e56ccb47-7523-4e34-abc5-6808e7130e4e"
"dr2009|URN:CORNELLLABOFORNITHOLOGY:EBIRD:OBS794930236","80d04f9d-c06c-4dba-92da-749d3e196f6f"
"dr2009|URN:CornellLabOfOrnithology:EBIRD:OBS268614498","246c8fd9-9989-4587-994d-bc217716f086"
ans025@aws-cass-cluster-1b:/data/tmp/ans025$ head -n 6 processed-uuids-dr2009.txt 
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:CornellLabOfOrnithology:EBIRD:OBS597913474';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD:OBS597913474','e56ccb47-7523-4e34-abc5-6808e7130e4e');
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:CORNELLLABOFORNITHOLOGY:EBIRD:OBS794930236';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD:OBS794930236','80d04f9d-c06c-4dba-92da-749d3e196f6f');
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:CornellLabOfOrnithology:EBIRD:OBS268614498';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD:OBS268614498','246c8fd9-9989-4587-994d-bc217716f086');

Then it was run with:

cqlsh -f processed-uuids-dr2009.txt
patkyn commented 3 years ago

Performed rowkey remapping for the EBIRD_CL collection in biocache (35 records). This is to change the current rowKey from dr2009|catalogNumber to dr2009|occurrenceID. I have generated out the script that has identified the current biocache occ records that belong to this collection EBIRF_CL.

Run the following:

cqlsh -f EBIRD_CL.sql

koh032@aws-cass-cluster-1b:~$ more EBIRD_CL.sql 
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:catalog:CLO:EBIRD:OBS659962669';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD_CL:OBS659962669','ca3d557f-380a-45ec-821e-f20701346433');
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:catalog:CLO:EBIRD:OBS353555781';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD_CL:OBS353555781','8380106f-6f77-4b5d-9420-bc54838d25b0');
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:catalog:CLO:EBIRD:OBS659962693';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD_CL:OBS659962693','2da8158a-2613-4b61-bfd9-c9c49f5e80ad');
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:catalog:CLO:EBIRD:OBS659962679';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD_CL:OBS659962679','58f832fa-bb9f-4627-bf73-a3f997b9a253');
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:catalog:CLO:EBIRD:OBS659962680';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD_CL:OBS659962680','e129502d-e754-4b1e-9cbf-05055d685767');
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:catalog:CLO:EBIRD:OBS659962670';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD_CL:OBS659962670','3af2f54c-168f-4c82-8cb8-73728980e07a');
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:catalog:CLO:EBIRD:OBS659962674';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD_CL:OBS659962674','07d2cfcf-ca29-4627-91e9-d41b0043e1a4');
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:catalog:CLO:EBIRD:OBS659962688';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD_CL:OBS659962688','d7f12d33-a1ef-49a7-ad57-97fbfc569dfc');
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:catalog:CLO:EBIRD:OBS659962681';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD_CL:OBS659962681','017aadbf-68aa-405a-b834-3a49d57b39bf');
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:catalog:CLO:EBIRD:OBS659962675';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD_CL:OBS659962675','cc9d8412-293d-4d7d-a281-9604f6767e10');
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:catalog:CLO:EBIRD:OBS353555784';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD_CL:OBS353555784','25976694-f042-4a9b-a8da-6b7301859cea');
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:catalog:CLO:EBIRD:OBS221805925';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD_CL:OBS221805925','7371e1b3-9797-495c-943a-7bed0bf5e82a');
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:catalog:CLO:EBIRD:OBS353555783';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD_CL:OBS353555783','e5602264-6503-4d95-8156-18d19b6b89df');
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:catalog:CLO:EBIRD:OBS659962692';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD_CL:OBS659962692','7eb3ce00-9759-42a1-9bac-d39a3cddfab8');
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:catalog:CLO:EBIRD:OBS659962694';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD_CL:OBS659962694','ef5b0a8f-b924-46e4-ae4a-9867666d2798');
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:catalog:CLO:EBIRD:OBS659962682';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD_CL:OBS659962682','b6d7dd25-3b6e-4530-8b78-34465598cd44');
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:catalog:CLO:EBIRD:OBS659962677';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD_CL:OBS659962677','ae394590-c0a0-4cb7-bc26-44e6a3f57166');
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:catalog:CLO:EBIRD:OBS659962691';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD_CL:OBS659962691','333b2c1f-1017-4081-9fdd-b45550362ee6');
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:catalog:CLO:EBIRD:OBS659962689';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD_CL:OBS659962689','c1e5c68d-13a2-4813-b010-239cf7199d4f');
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:catalog:CLO:EBIRD:OBS659962685';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD_CL:OBS659962685','12195a6c-ed4a-4d69-90b0-0fdcdb713563');
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:catalog:CLO:EBIRD:OBS659962686';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD_CL:OBS659962686','2eb5707d-21f4-4000-9b00-6734ff576897');
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:catalog:CLO:EBIRD:OBS353555782';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD_CL:OBS353555782','bc10a26f-2117-460b-864f-fce6900e72b7');
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:catalog:CLO:EBIRD:OBS221805907';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD_CL:OBS221805907','e29afd28-62af-4f3b-9ae7-5a60b513f30a');
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:catalog:CLO:EBIRD:OBS659962678';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD_CL:OBS659962678','befe26ae-39c6-4d13-ac0a-1144d9f319a5');
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:catalog:CLO:EBIRD:OBS353555785';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD_CL:OBS353555785','cf9b100a-fb11-4fb8-9fd4-2f614cbf906c');
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:catalog:CLO:EBIRD:OBS659962687';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD_CL:OBS659962687','7b83a596-1a38-42b7-a2ad-3148fa54ba15');
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:catalog:CLO:EBIRD:OBS659962673';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD_CL:OBS659962673','a8796682-b98b-45f4-861f-c75153095013');
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:catalog:CLO:EBIRD:OBS659962672';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD_CL:OBS659962672','d4603c41-522e-4de9-8351-125aed1d90f1');
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:catalog:CLO:EBIRD:OBS659962683';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD_CL:OBS659962683','f5d434ba-1224-484c-978a-a0d9112de980');
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:catalog:CLO:EBIRD:OBS659962676';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD_CL:OBS659962676','4e6d8284-1134-416d-81fd-9ca723086575');
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:catalog:CLO:EBIRD:OBS659962671';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD_CL:OBS659962671','4442bc85-90cd-40cf-bb03-5bd8ec73cef3');
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:catalog:CLO:EBIRD:OBS659962690';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD_CL:OBS659962690','93b0aab1-acc2-4175-9141-0c41b5f34999');
DELETE FROM occ.occ_uuid WHERE rowkey='dr2009|URN:catalog:CLO:EBIRD:OBS659962684';
INSERT INTO occ.occ_uuid(rowkey,value) values('dr2009|URN:catalog:CLO:EBIRD_CL:OBS659962684','98c9b427-9b9a-45fa-9b3c-4c74667ce79a');

Next, I downloaded the GBIF records from here: https://www.gbif.org/occurrence/search?collection_code=EBIRD_CL&country=AU&dataset_key=4fa7b334-ce0d-4e88-aaae-2e0c138d049e and uploaded it into https://collections.ala.org.au/dataResource/show/dr2009 and set the occurrence. However, when I ran Process Load Sample for the dwca, the the values of occurrenceID is not being picked up but instead identifier field values.

image (4)

image (3)

Upon debugging, https://github.com/AtlasOfLivingAustralia/biocache-store/blob/master/src/main/scala/au/org/ala/biocache/load/DwCALoader.scala#L489, it seems that it has picked up the config from dwc.txt which has mapped Identifier to OccurrenceID as shown here:

image (6)

Question: should Identifier be treated as a separate item or be removed as it is a Dublin core field? Will this impact the existing data load which uses Identifier

patkyn commented 3 years ago

In order to avoid impacting other existing loads which already uses Identifier as occurrenceID, we have agreed to remain vocab dwc.txt unchanged but modify the meta.xml and rename identifier field so that it doesn't point to the Identifier dublin core term so that biocache picks up the correct occurrenceID column values.

patkyn commented 3 years ago

Remapped EBIRD_AK and Loaded GBIF_EBIRD_AK dwca EBIRD_AK.txt

aws-bstore-4b 2020-12-15 13:26:33,218 INFO : [DataLoader] - 280, >> last key : dr2009|URN:catalog:CLO:EBIRD_AK:OBS748094708, UUID: , records per sec: 384.6154
aws-bstore-4b 2020-12-15 13:26:33,219 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 13:26:33,220 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 13:26:33,221 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 13:26:33,222 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 13:26:33,224 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 13:26:33,227 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 13:26:33,232 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 13:26:33,235 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 13:26:33,239 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 13:26:33,240 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 13:26:33,241 INFO : [DataLoader] - 290, >> last key : dr2009|URN:catalog:CLO:EBIRD_AK:OBS748094712, UUID: , records per sec: 454.54547
aws-bstore-4b 2020-12-15 13:26:33,241 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 13:26:33,243 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 13:26:33,244 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 13:26:33,245 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 13:26:33,247 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 13:26:33,248 INFO : [DataLoader] - Finished DwCA loader. Records loaded into the system: 295, records skipped: 0, new records: 31
aws-bstore-4b 2020-12-15 13:26:33,385 INFO : [DataLoader] - Registry response code: 200
aws-bstore-4b 2020-12-15 13:26:33,974 INFO : [DataLoader] - Registry response code: 200
aws-bstore-4b 2020-12-15 13:26:33,974 INFO : [Loader] - Completed loading resource: dr2009. Completed in 3.595seconds (0.059916668 minutes)
Finished: SUCCESS
patkyn commented 3 years ago

Remapped EBIRD_BCN EBIRD_ARG EBIRD_ATL_NZ EBIRD_BRA EBIRD_ARG.txt EBIRD_ATL_NZ.txt EBIRD_BCN.txt EBIRD_BRA.txt

aws-bstore-4b 2020-12-15 14:11:27,559 INFO : [DataLoader] - 60, >> last key : dr2009|URN:catalog:CLO:EBIRD_ARG:OBS266387818, UUID: , records per sec: 175.4386
aws-bstore-4b 2020-12-15 14:11:27,565 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:27,568 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:27,571 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:27,573 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:27,575 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:27,584 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:27,591 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:27,596 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:27,598 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:27,600 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:27,602 INFO : [DataLoader] - 70, >> last key : dr2009|URN:catalog:CLO:EBIRD_ARG:OBS266387820, UUID: , records per sec: 232.55814
aws-bstore-4b 2020-12-15 14:11:27,604 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:27,607 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:27,618 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:27,626 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:27,628 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:27,632 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:27,634 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:27,639 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:27,643 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:27,648 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:27,648 INFO : [DataLoader] - 80, >> last key : dr2009|URN:catalog:CLO:EBIRD_ARG:OBS266391961, UUID: , records per sec: 217.3913
aws-bstore-4b 2020-12-15 14:11:27,650 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:27,660 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:27,666 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:27,680 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:27,682 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:27,687 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:27,690 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
...
aws-bstore-4b 2020-12-15 14:11:28,414 INFO : [DataLoader] - 360, >> last key : dr2009|URN:catalog:CLO:EBIRD_BRA:OBS603485490, UUID: , records per sec: 322.58066
aws-bstore-4b 2020-12-15 14:11:28,415 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,417 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,421 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,422 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,431 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,435 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,437 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,439 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,443 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,452 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,452 INFO : [DataLoader] - 370, >> last key : dr2009|URN:catalog:CLO:EBIRD_BRA:OBS603463260, UUID: , records per sec: 263.1579
aws-bstore-4b 2020-12-15 14:11:28,453 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,455 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,463 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,464 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,467 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,469 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,475 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,476 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,481 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,482 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,483 INFO : [DataLoader] - 380, >> last key : dr2009|URN:catalog:CLO:EBIRD_BCN:OBS611826728, UUID: , records per sec: 322.58066
aws-bstore-4b 2020-12-15 14:11:28,483 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,485 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,487 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,488 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,489 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,492 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,494 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,499 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,503 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,504 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,504 INFO : [DataLoader] - 390, >> last key : dr2009|URN:catalog:CLO:EBIRD_BCN:OBS613693398, UUID: , records per sec: 476.1905
aws-bstore-4b 2020-12-15 14:11:28,505 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,506 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,507 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,515 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,516 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,518 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,523 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,524 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,525 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,526 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,528 INFO : [DataLoader] - 400, >> last key : dr2009|URN:catalog:CLO:EBIRD_ARG:OBS617359850, UUID: , records per sec: 416.66666
aws-bstore-4b 2020-12-15 14:11:28,529 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,539 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,541 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,547 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,548 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,554 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,556 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,558 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,564 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,565 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,566 INFO : [DataLoader] - 410, >> last key : dr2009|URN:catalog:CLO:EBIRD_BRA:OBS603480335, UUID: , records per sec: 263.1579
aws-bstore-4b 2020-12-15 14:11:28,567 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,571 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:28,573 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
...
aws-bstore-4b 2020-12-15 14:11:36,732 INFO : [DataLoader] - 5230, >> last key : dr2009|URN:catalog:CLO:EBIRD_BCN:OBS803490238, UUID: , records per sec: 833.3333
aws-bstore-4b 2020-12-15 14:11:36,733 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,739 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,740 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,741 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,742 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,747 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,754 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,755 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,759 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,760 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,764 INFO : [DataLoader] - 5240, >> last key : dr2009|URN:catalog:CLO:EBIRD_ATL_NZ:OBS787028752, UUID: , records per sec: 312.5
aws-bstore-4b 2020-12-15 14:11:36,765 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,767 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,768 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,769 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,773 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,774 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,775 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,776 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,777 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,779 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,779 INFO : [DataLoader] - 5250, >> last key : dr2009|URN:catalog:CLO:EBIRD_ATL_NZ:OBS835124438, UUID: , records per sec: 666.6667
aws-bstore-4b 2020-12-15 14:11:36,780 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,781 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,783 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,791 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,792 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,793 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,794 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,799 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,803 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,804 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,804 INFO : [DataLoader] - 5260, >> last key : dr2009|URN:catalog:CLO:EBIRD_ARG:OBS841141282, UUID: , records per sec: 400.0
aws-bstore-4b 2020-12-15 14:11:36,805 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,806 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,806 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,807 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,808 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,810 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,811 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,812 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,813 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,814 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,814 INFO : [DataLoader] - 5270, >> last key : dr2009|URN:catalog:CLO:EBIRD_ARG:OBS841138259, UUID: , records per sec: 1000.0
aws-bstore-4b 2020-12-15 14:11:36,815 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:11:36,816 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
...
aws-bstore-4b 2020-12-15 14:11:37,172 INFO : [DataLoader] - Finished DwCA loader. Records loaded into the system: 5476, records skipped: 0, new records: 1472
aws-bstore-4b 2020-12-15 14:11:37,304 INFO : [DataLoader] - Registry response code: 200
aws-bstore-4b 2020-12-15 14:11:37,893 INFO : [DataLoader] - Registry response code: 200
aws-bstore-4b 2020-12-15 14:11:37,893 INFO : [Loader] - Completed loading resource: dr2009. Completed in 12.479seconds (0.20798333 minutes)
Finished: SUCCESS

image

patkyn commented 3 years ago

Remapped GBIF-EBIRD_CAMERICA EBIRD_CAN EBIRD_CB EBIRD_ESP

Downloaded DWCA from GBIF https://www.gbif.org/occurrence/search?collection_code=EBIRD_CAMERICA&collection_code=EBIRD_CAN&collection_code=EBIRD_CB&collection_code=EBIRD_ESP&country=AU&dataset_key=4fa7b334-ce0d-4e88-aaae-2e0c138d049e

image

aws-bstore-4b 2020-12-15 14:57:07,845 INFO : [DataLoader] - 63290, >> last key : dr2009|URN:catalog:CLO:EBIRD_CAN:OBS206515697, UUID: , records per sec: 666.6667
aws-bstore-4b 2020-12-15 14:57:07,845 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:57:07,846 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:57:07,847 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:57:07,848 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:57:07,849 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:57:07,849 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:57:07,850 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:57:07,855 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:57:07,856 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:57:07,856 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:57:07,857 INFO : [DataLoader] - 63300, >> last key : dr2009|URN:catalog:CLO:EBIRD_CAN:OBS113545744, UUID: , records per sec: 833.3333
aws-bstore-4b 2020-12-15 14:57:07,857 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:57:07,862 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:57:07,863 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:57:07,864 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:57:07,864 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:57:07,865 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:57:07,866 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:57:07,871 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 14:57:07,872 INFO : [DataLoader] - Finished DwCA loader. Records loaded into the system: 63308, records skipped: 0, new records: 15364
aws-bstore-4b 2020-12-15 14:57:08,024 INFO : [DataLoader] - Registry response code: 200
aws-bstore-4b 2020-12-15 14:57:08,592 INFO : [DataLoader] - Registry response code: 200
aws-bstore-4b 2020-12-15 14:57:08,592 INFO : [Loader] - Completed loading resource: dr2009. Completed in 93.003seconds (1.55005 minutes)
Finished: SUCCESS
patkyn commented 3 years ago

Remapped EBIRD_MA EBIRD_ME EBIRD_NJ EBIRD_MYS EBIRD_NZ EBIRD_MEX

Downloaded DWCA from https://www.gbif.org/occurrence/search?collection_code=EBIRD_MA&collection_code=EBIRD_ME&collection_code=EBIRD_NJ&collection_code=EBIRD_MYS&collection_code=EBIRD_NZ&collection_code=EBIRD_MEX&country=AU&dataset_key=4fa7b334-ce0d-4e88-aaae-2e0c138d049e

image

Loaded DWCA into biocache

aws-bstore-4b 2020-12-15 17:24:23,197 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 17:24:23,198 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 17:24:23,199 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 17:24:23,200 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 17:24:23,207 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 17:24:23,207 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 17:24:23,208 INFO : [DataLoader] - 31800, >> last key : dr2009|URN:catalog:CLO:EBIRD_NZ:OBS223846476, UUID: , records per sec: 270.27026
aws-bstore-4b 2020-12-15 17:24:23,209 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 17:24:23,209 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 17:24:23,210 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 17:24:23,214 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 17:24:23,215 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 17:24:23,216 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 17:24:23,217 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 17:24:23,218 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 17:24:23,219 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 17:24:23,222 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 17:24:23,223 INFO : [DataLoader] - 31810, >> last key : dr2009|URN:catalog:CLO:EBIRD_ME:OBS248781572, UUID: , records per sec: 666.6667
aws-bstore-4b 2020-12-15 17:24:23,223 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 17:24:23,224 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 17:24:23,225 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 17:24:23,226 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 17:24:23,226 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 17:24:23,227 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 17:24:23,228 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-15 17:24:23,230 INFO : [DataLoader] - Finished DwCA loader. Records loaded into the system: 31817, records skipped: 0, new records: 9134
aws-bstore-4b 2020-12-15 17:24:23,371 INFO : [DataLoader] - Registry response code: 200
aws-bstore-4b 2020-12-15 17:24:23,949 INFO : [DataLoader] - Registry response code: 200
aws-bstore-4b 2020-12-15 17:24:23,949 INFO : [Loader] - Completed loading resource: dr2009. Completed in 52.805seconds (0.8800833 minutes)
Finished: SUCCESS
patkyn commented 3 years ago

Remapped EBIRD_IND EBIRD_WI EBIRD_TWN EBIRD_PA EBIRD_PR EBIRD_PNW EBIRD_QC EBIRD_VA EBIRD_TX occ rowkeys in cassandra.

There are no existing occ records that are for EBIRD_MT EBIRD_POR and EBIRD_COL. Hence, no need for remapping.

Downloaded DWCA from https://www.gbif.org/occurrence/download?collection_code=EBIRD_MT&collection_code=EBIRD_POR&collection_code=EBIRD_COL&collection_code=EBIRD_IND&collection_code=EBIRD_WI&collection_code=EBIRD_TWN&collection_code=EBIRD_PA&collection_code=EBIRD_PR&collection_code=EBIRD_PNW&collection_code=EBIRD_QC&collection_code=EBIRD_VA&collection_code=EBIRD_TX&country=AU&dataset_key=4fa7b334-ce0d-4e88-aaae-2e0c138d049e

image

Loaded DWCA into biocache

aws-bstore-4b 2020-12-16 09:11:51,032 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-16 09:11:51,032 INFO : [DataLoader] - 35360, >> last key : dr2009|URN:catalog:CLO:EBIRD_VA:OBS215094039, UUID: , records per sec: 625.0
aws-bstore-4b 2020-12-16 09:11:51,033 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-16 09:11:51,033 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-16 09:11:51,034 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-16 09:11:51,035 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-16 09:11:51,035 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-16 09:11:51,036 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-16 09:11:51,037 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-16 09:11:51,038 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-16 09:11:51,038 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-16 09:11:51,043 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-16 09:11:51,049 INFO : [DataLoader] - 35370, >> last key : dr2009|URN:catalog:CLO:EBIRD_VA:OBS215094034, UUID: , records per sec: 588.2353
aws-bstore-4b 2020-12-16 09:11:51,050 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-16 09:11:51,050 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-16 09:11:51,051 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-16 09:11:51,052 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-16 09:11:51,053 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-16 09:11:51,053 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-16 09:11:51,054 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-16 09:11:51,055 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-16 09:11:51,059 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-16 09:11:51,059 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-16 09:11:51,060 INFO : [DataLoader] - 35380, >> last key : dr2009|URN:catalog:CLO:EBIRD_VA:OBS215094035, UUID: , records per sec: 909.09094
aws-bstore-4b 2020-12-16 09:11:51,061 INFO : [DataLoader] - Finished DwCA loader. Records loaded into the system: 35380, records skipped: 0, new records: 11807
aws-bstore-4b 2020-12-16 09:11:51,203 INFO : [DataLoader] - Registry response code: 200
aws-bstore-4b 2020-12-16 09:11:51,780 INFO : [DataLoader] - Registry response code: 200
aws-bstore-4b 2020-12-16 09:11:51,780 INFO : [Loader] - Completed loading resource: dr2009. Completed in 54.0seconds (0.9 minutes)
Finished: SUCCESS
patkyn commented 3 years ago

Remapped EBIRD_AU occ rowkeys in cassandra

Downloaded DWCA from https://www.gbif.org/occurrence/download?collection_code=EBIRD_MT&collection_code=EBIRD_POR&collection_code=EBIRD_COL&collection_code=EBIRD_IND&collection_code=EBIRD_WI&collection_code=EBIRD_TWN&collection_code=EBIRD_PA&collection_code=EBIRD_PR&collection_code=EBIRD_PNW&collection_code=EBIRD_QC&collection_code=EBIRD_VA&collection_code=EBIRD_TX&country=AU&dataset_key=4fa7b334-ce0d-4e88-aaae-2e0c138d049e

Loaded DWCA into biocache

...
aws-bstore-4b 2020-12-17 14:47:04,135 INFO : [DataLoader] - 10031370, >> last key : dr2009|URN:catalog:CLO:EBIRD_AU:OBS940091259, UUID: , records per sec: 769.2308
aws-bstore-4b 2020-12-17 14:47:04,136 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-17 14:47:04,138 INFO : [DataLoader] - Finished DwCA loader. Records loaded into the system: 10031371, records skipped: 0, new records: 1209304
aws-bstore-4b 2020-12-17 14:47:04,594 INFO : [DataLoader] - Registry response code: 200
aws-bstore-4b 2020-12-17 14:47:04,935 INFO : [DataLoader] - Registry response code: 200
aws-bstore-4b 2020-12-17 14:47:04,935 INFO : [Loader] - Completed loading resource: dr2009. Completed in 14634.711seconds (243.91185 minutes)
Finished: SUCCESS
patkyn commented 3 years ago

Loaded up first part of EBIRD collection_code DWCA

image

aws-bstore-4b 2020-12-18 12:47:03,251 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-18 12:47:03,252 INFO : [DataLoader] - 10576560, >> last key : dr2009|URN:catalog:CLO:EBIRD:OBS236240628, UUID: , records per sec: 714.2857
aws-bstore-4b 2020-12-18 12:47:03,253 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-18 12:47:03,255 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2020-12-18 12:47:03,259 INFO : [DataLoader] - Finished DwCA loader. Records loaded into the system: 10576562, records skipped: 0, new records: 1574929
aws-bstore-4b 2020-12-18 12:47:03,598 INFO : [DataLoader] - Registry response code: 200
aws-bstore-4b 2020-12-18 12:47:04,162 INFO : [DataLoader] - Registry response code: 200
aws-bstore-4b 2020-12-18 12:47:04,162 INFO : [Loader] - Completed loading resource: dr2009. Completed in 15104.655seconds (251.74425 minutes)
Finished: SUCCESS
patkyn commented 3 years ago

Loaded up 2nd part of the EBIRD collection_code DWCA image

aws-bstore-4b 2021-01-05 11:59:15,182 INFO : [DataLoader] - 998260, >> last key : dr2009|URN:catalog:CLO:EBIRD:OBS199902495, UUID: , records per sec: 666.6667
aws-bstore-4b 2021-01-05 11:59:15,182 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2021-01-05 11:59:15,183 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2021-01-05 11:59:15,184 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2021-01-05 11:59:15,187 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2021-01-05 11:59:15,187 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2021-01-05 11:59:15,188 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2021-01-05 11:59:15,189 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2021-01-05 11:59:15,191 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2021-01-05 11:59:15,192 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2021-01-05 11:59:15,198 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2021-01-05 11:59:15,198 INFO : [DataLoader] - 998270, >> last key : dr2009|URN:catalog:CLO:EBIRD:OBS215268067, UUID: , records per sec: 625.0
aws-bstore-4b 2021-01-05 11:59:15,199 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2021-01-05 11:59:15,203 INFO : [DataLoader] - Only loading multimedia for specific core record types, starRecord.core().rowType()=dwc:Occurrence
aws-bstore-4b 2021-01-05 11:59:15,206 INFO : [DataLoader] - Finished DwCA loader. Records loaded into the system: 998272, records skipped: 0, new records: 447964
aws-bstore-4b 2021-01-05 11:59:15,354 INFO : [DataLoader] - Registry response code: 200
aws-bstore-4b 2021-01-05 11:59:15,930 INFO : [DataLoader] - Registry response code: 200
aws-bstore-4b 2021-01-05 11:59:15,930 INFO : [Loader] - Completed loading resource: dr2009. Completed in 1594.144seconds (26.569067 minutes)
Finished: SUCCESS
patkyn commented 3 years ago

After completing the load, there seems to be some old records (760,894) that don't exist anymore in GBIF. https://biocache.ala.org.au/occurrence/search?q=data_resource_uid:dr2009&fq=last_load_date:[*+TO+2020-12-01T00:00:00Z]

These records would have empty keys (occurrenceIDs) as we have not updated these from GBIF.

Should these records be removed from biocache?

patkyn commented 3 years ago

The old records that have empty occurrenceID keys are deleted now.