Closed gdower closed 2 years ago
@gdower do you have the failed archive at hand or the distribution record with "CHITRA:" or "chitra:" as the start of the area value?
The top 3 lines are probably the problem:
cat Distribution.gsv| grep Chitra
Reptilia-Testudines-Cryptodira-Trionychoidea-Trionychidae-Chitra chitra-8268ec07f79485dfb50a0d783d6fd83b- Chitra chitra javanensis-a61d9dde1b184eea19603e20afc644ca chitra: Thailand (NW as far as the Khwae Noi and Khwae Yai River basins of the Mae Klong River system; NE in the Mae Ping River basin of the Chao Phraya River system). text
Reptilia-Testudines-Cryptodira-Trionychoidea-Trionychidae-Chitra chitra-8268ec07f79485dfb50a0d783d6fd83b- Chitra chitra chitra-a4cf529d16d912c823e49ac8c1c32200 chitra: Thailand (NW as far as the Khwae Noi and Khwae Yai River basins of the Mae Klong River system; NE in the Mae Ping River basin of the Chao Phraya River system). text
Reptilia-Testudines-Cryptodira-Trionychoidea-Trionychidae-Chitra chitra-8268ec07f79485dfb50a0d783d6fd83b chitra: Thailand (NW as far as the Khwae Noi and Khwae Yai River basins of the Mae Klong River system; NE in the Mae Ping River basin of the Chao Phraya River system). text
Reptilia-Squamata-Gekkota-Gekkonidae-Mediodactylus walli-64dcb4d33ee54583013a770d4eb2f8bd NW Pakistan (Ayun, Chitral, Bamburet Valley, Bermoghluscht, Drosh Tehsil, and 7.0 km N Drosh, in the Chitral District, Northwest Frontier, 1,970–2,120 m elevation) text
Reptilia-Squamata-Viperidae-Crotalinae-Gloydius himalayanus-19654581e9c563ce5a40c93521385634 Pakistan (Chitral, Murree (= Marri)), text
Reptilia-Squamata-Iguania-Agamidae-Agaminae-Laudakia nuristanica-bce0499c516461f1ce165872edb5fbc7 E Afghanistan (Nuristan, Kunar, Panjshir, Parwan, Takhar), NW Pakistan (Chitral Valley) text
Reptilia-Testudines-Cryptodira-Trionychoidea-Trionychidae-Chitra indica-994ae1d9731cebe455aff211bcb25b93 India (Ganges, Godavari , Mahanadi, Sutlaj, Indus, Kerala, Assam, Jammu and Kashmir) Pakistan, Nepal, text
Reptilia-Testudines-Cryptodira-Trionychoidea-Trionychidae-Chitra vandijki-15e247b50d557dac4d3ed7f0c8a1c531 Myanmar (= Burma), NW Thailand text
Reptilia-Squamata-Scincomorpha-Scincidae-Sphenomorphinae-Asymblepharus himalayanus-19654581e9c563ce5a40c93521385634 N Pakistan (Chitral, Hazara), India (W Himalaya: Jammu and Kashmir, Punjab, Himachal Pradesh, Uttar Pradesh), W Nepal, Pakistan, Turkmenistan text
There's probably an inconsistency that breaks the data converter, but it would be preferable if the importer flags bad rows with an invalid gazetteer issue instead of crashing. I'll look into fixing the data issue.
It is this area value which looks like a prefixed code to the importer:
chitra: Thailand (NW as far as the Khwae Noi and Khwae Yai River basins of the Mae Klong River system; NE in the Mae Ping River basin of the Chao Phraya River system).
I have fixed the importer and added tests that this works fine as text distributions. But what does chitra: actually mean content wise?
I reimported reptile db and now it works: https://data.catalogueoflife.org/dataset/1008/imports
There are no distributions in the archive though - maybe that was the wrong one?
The importer failed for ReptileDB with an IllegalArgumentException with the gazetteer, and that also broke the NormalizerStore for dataset 1008 so I can't import again:
IllegalStateException: Failed to init NormalizerStore at /tmp/col/scratch/1008/normalizer
.Logs&_a=(columns:!(level,logger_name,message),filters:!()))