AtlasOfLivingAustralia / biocache-store

Occurrence processing, indexing and batch processing
Other
7 stars 24 forks source link

Export sees row keys that are not found by Process #178

Closed ansell closed 7 years ago

ansell commented 7 years ago

The following command successfully finds all of the 210k+ records that are loaded for dr341:

biocache export -s "dr341|" -e "dr341|~" -c "institutionCode collectionCode catalogNumber" -rk -sc "," occ /data/biocache-load/dr341/anwc/dr341-keys.csv

However, the following two commands only find ~160k records loaded from one of the two data files, skipping an entire collectionCode "GeneticSamples":

biocache process -dr dr341 -s "dr341|" -e "dr341|~"
biocache process -s "dr341|" -e "dr341|~"
ansell commented 7 years ago

export seems to be using Solr, which had previously had the other file manually loaded somehow. There was no way that AutoDwcCSVLoader had loaded the file in question recently as the regex doesn't match the file name used. Closing this and will detail the changes necessary for AutoDwcCSVLoader to make it work with ANWC