CatalogueOfLife / backend

Complete backend of COL ChecklistBank
Apache License 2.0
15 stars 11 forks source link

IllegalArgumentException: No enum constant life.catalogue.api.vocab.Gazetteer.CHITRA #1064

Closed gdower closed 2 years ago

gdower commented 2 years ago

The importer failed for ReptileDB with an IllegalArgumentException with the gazetteer, and that also broke the NormalizerStore for dataset 1008 so I can't import again: IllegalStateException: Failed to init NormalizerStore at /tmp/col/scratch/1008/normalizer.

Logs&_a=(columns:!(level,logger_name,message),filters:!()))

com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: No enum constant life.catalogue.api.vocab.Gazetteer.CHITRA
Serialization trace:
area (life.catalogue.api.model.Distribution)
distributions (life.catalogue.importer.neo.model.NeoUsage)
    at com.esotericsoftware.kryo.serializers.ReflectField.read(ReflectField.java:133)
    at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:122)
    at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:725)
    at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:239)
    at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:43)
    at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:725)
    at com.esotericsoftware.kryo.serializers.ReflectField.read(ReflectField.java:114)
    at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:122)
    at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:703)
    at life.catalogue.common.kryo.map.MapDbObjectSerializer.deserialize(MapDbObjectSerializer.java:57)
    at org.mapdb.StoreDirectAbstract.deserialize(StoreDirectAbstract.kt:229)
    at org.mapdb.StoreDirect.get(StoreDirect.kt:546)
    at org.mapdb.HTreeMap.valueUnwrap(HTreeMap.kt:1200)
    at org.mapdb.HTreeMap.getprotected(HTreeMap.kt:644)
    at org.mapdb.HTreeMap.get(HTreeMap.kt:603)
    at life.catalogue.importer.neo.NeoCRUDStore.objByNode(NeoCRUDStore.java:50)
    at life.catalogue.importer.neo.NeoCRUDStore.objByID(NeoCRUDStore.java:61)
    at life.catalogue.importer.NeoCsvInserter.lambda$insertTaxonEntities$2(NeoCsvInserter.java:158)
    at life.catalogue.importer.NeoCsvInserter.lambda$processVerbatim$0(NeoCsvInserter.java:115)
    at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
    at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:658)
    at life.catalogue.importer.NeoCsvInserter.processVerbatim(NeoCsvInserter.java:113)
    at life.catalogue.importer.NeoCsvInserter.insertTaxonEntities(NeoCsvInserter.java:151)
    at life.catalogue.importer.coldp.ColdpInserter.batchInsert(ColdpInserter.java:137)
    at life.catalogue.importer.NeoCsvInserter.insertAll(NeoCsvInserter.java:84)
    at life.catalogue.importer.Normalizer.insertData(Normalizer.java:904)
    at life.catalogue.importer.Normalizer.call(Normalizer.java:91)
    at life.catalogue.importer.ImportJob.importDataset(ImportJob.java:238)
    at life.catalogue.importer.ImportJob.run(ImportJob.java:126)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.IllegalArgumentException: No enum constant life.catalogue.api.vocab.Gazetteer.CHITRA
    at java.base/java.lang.Enum.valueOf(Enum.java:240)
    at life.catalogue.api.vocab.Gazetteer.valueOf(Gazetteer.java:8)
    at life.catalogue.api.vocab.Gazetteer.of(Gazetteer.java:101)
    at life.catalogue.common.kryo.AreaSerializer.parse(AreaSerializer.java:44)
    at life.catalogue.common.kryo.AreaSerializer.read(AreaSerializer.java:33)
    at life.catalogue.common.kryo.AreaSerializer.read(AreaSerializer.java:14)
    at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:725)
    at com.esotericsoftware.kryo.serializers.ReflectField.read(ReflectField.java:114)
    ... 34 common frames omitted
life.catalogue.importer.NormalizationFailedException: Failed to batch insert csv data
    at life.catalogue.importer.NeoCsvInserter.insertAll(NeoCsvInserter.java:89)
    at life.catalogue.importer.Normalizer.insertData(Normalizer.java:904)
    at life.catalogue.importer.Normalizer.call(Normalizer.java:91)
    at life.catalogue.importer.ImportJob.importDataset(ImportJob.java:238)
    at life.catalogue.importer.ImportJob.run(ImportJob.java:126)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: No enum constant life.catalogue.api.vocab.Gazetteer.CHITRA
Serialization trace:
area (life.catalogue.api.model.Distribution)
distributions (life.catalogue.importer.neo.model.NeoUsage)
    at com.esotericsoftware.kryo.serializers.ReflectField.read(ReflectField.java:133)
    at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:122)
    at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:725)
    at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:239)
    at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:43)
    at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:725)
    at com.esotericsoftware.kryo.serializers.ReflectField.read(ReflectField.java:114)
    at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:122)
    at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:703)
    at life.catalogue.common.kryo.map.MapDbObjectSerializer.deserialize(MapDbObjectSerializer.java:57)
    at org.mapdb.StoreDirectAbstract.deserialize(StoreDirectAbstract.kt:229)
    at org.mapdb.StoreDirect.get(StoreDirect.kt:546)
    at org.mapdb.HTreeMap.valueUnwrap(HTreeMap.kt:1200)
    at org.mapdb.HTreeMap.getprotected(HTreeMap.kt:644)
    at org.mapdb.HTreeMap.get(HTreeMap.kt:603)
    at life.catalogue.importer.neo.NeoCRUDStore.objByNode(NeoCRUDStore.java:50)
    at life.catalogue.importer.neo.NeoCRUDStore.objByID(NeoCRUDStore.java:61)
    at life.catalogue.importer.NeoCsvInserter.lambda$insertTaxonEntities$2(NeoCsvInserter.java:158)
    at life.catalogue.importer.NeoCsvInserter.lambda$processVerbatim$0(NeoCsvInserter.java:115)
    at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
    at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:658)
    at life.catalogue.importer.NeoCsvInserter.processVerbatim(NeoCsvInserter.java:113)
    at life.catalogue.importer.NeoCsvInserter.insertTaxonEntities(NeoCsvInserter.java:151)
    at life.catalogue.importer.coldp.ColdpInserter.batchInsert(ColdpInserter.java:137)
    at life.catalogue.importer.NeoCsvInserter.insertAll(NeoCsvInserter.java:84)
    ... 9 common frames omitted
Caused by: java.lang.IllegalArgumentException: No enum constant life.catalogue.api.vocab.Gazetteer.CHITRA
    at java.base/java.lang.Enum.valueOf(Enum.java:240)
    at life.catalogue.api.vocab.Gazetteer.valueOf(Gazetteer.java:8)
    at life.catalogue.api.vocab.Gazetteer.of(Gazetteer.java:101)
    at life.catalogue.common.kryo.AreaSerializer.parse(AreaSerializer.java:44)
    at life.catalogue.common.kryo.AreaSerializer.read(AreaSerializer.java:33)
    at life.catalogue.common.kryo.AreaSerializer.read(AreaSerializer.java:14)
    at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:725)
    at com.esotericsoftware.kryo.serializers.ReflectField.read(ReflectField.java:114)
    ... 34 common frames omitted
mdoering commented 2 years ago

@gdower do you have the failed archive at hand or the distribution record with "CHITRA:" or "chitra:" as the start of the area value?

gdower commented 2 years ago

The top 3 lines are probably the problem:

cat Distribution.gsv| grep Chitra
Reptilia-Testudines-Cryptodira-Trionychoidea-Trionychidae-Chitra chitra-8268ec07f79485dfb50a0d783d6fd83b- Chitra chitra javanensis-a61d9dde1b184eea19603e20afc644ca     chitra: Thailand (NW as far as the Khwae Noi and Khwae Yai River basins of the Mae Klong River system; NE in the Mae Ping River basin of the Chao Phraya River system).   text
Reptilia-Testudines-Cryptodira-Trionychoidea-Trionychidae-Chitra chitra-8268ec07f79485dfb50a0d783d6fd83b- Chitra chitra chitra-a4cf529d16d912c823e49ac8c1c32200 chitra: Thailand (NW as far as the Khwae Noi and Khwae Yai River basins of the Mae Klong River system; NE in the Mae Ping River basin of the Chao Phraya River system).   text
Reptilia-Testudines-Cryptodira-Trionychoidea-Trionychidae-Chitra chitra-8268ec07f79485dfb50a0d783d6fd83b        chitra: Thailand (NW as far as the Khwae Noi and Khwae Yai River basins of the Mae Klong River system; NE in the Mae Ping River basin of the Chao Phraya River system).   text
Reptilia-Squamata-Gekkota-Gekkonidae-Mediodactylus walli-64dcb4d33ee54583013a770d4eb2f8bd       NW Pakistan (Ayun, Chitral, Bamburet Valley, Bermoghluscht, Drosh Tehsil, and 7.0 km N Drosh, in the Chitral District, Northwest Frontier, 1,970–2,120 m elevation)       text
Reptilia-Squamata-Viperidae-Crotalinae-Gloydius himalayanus-19654581e9c563ce5a40c93521385634    Pakistan (Chitral, Murree (= Marri)),   text
Reptilia-Squamata-Iguania-Agamidae-Agaminae-Laudakia nuristanica-bce0499c516461f1ce165872edb5fbc7       E Afghanistan (Nuristan, Kunar, Panjshir, Parwan, Takhar), NW Pakistan (Chitral Valley) text
Reptilia-Testudines-Cryptodira-Trionychoidea-Trionychidae-Chitra indica-994ae1d9731cebe455aff211bcb25b93        India (Ganges, Godavari , Mahanadi, Sutlaj, Indus, Kerala, Assam, Jammu and Kashmir) Pakistan, Nepal,     text
Reptilia-Testudines-Cryptodira-Trionychoidea-Trionychidae-Chitra vandijki-15e247b50d557dac4d3ed7f0c8a1c531      Myanmar (= Burma), NW Thailand  text
Reptilia-Squamata-Scincomorpha-Scincidae-Sphenomorphinae-Asymblepharus himalayanus-19654581e9c563ce5a40c93521385634     N Pakistan (Chitral, Hazara), India (W Himalaya: Jammu and Kashmir, Punjab, Himachal Pradesh, Uttar Pradesh), W Nepal, Pakistan, Turkmenistan     text

There's probably an inconsistency that breaks the data converter, but it would be preferable if the importer flags bad rows with an invalid gazetteer issue instead of crashing. I'll look into fixing the data issue.

mdoering commented 2 years ago

It is this area value which looks like a prefixed code to the importer:

chitra: Thailand (NW as far as the Khwae Noi and Khwae Yai River basins of the Mae Klong River system; NE in the Mae Ping River basin of the Chao Phraya River system).

I have fixed the importer and added tests that this works fine as text distributions. But what does chitra: actually mean content wise?

mdoering commented 2 years ago

I reimported reptile db and now it works: https://data.catalogueoflife.org/dataset/1008/imports

mdoering commented 2 years ago

There are no distributions in the archive though - maybe that was the wrong one?