gbif / pipelines

Pipelines for data processing (GBIF and LivingAtlases)
Apache License 2.0
40 stars 28 forks source link

Exception during interpretation caused by tag "deprecated" in EventType.json #1082

Closed vjrj closed 2 months ago

vjrj commented 3 months ago

I'm getting this error injecting a dataset:

ERROR [2024-08-01 12:25:13,849+0200] [Executor task launch worker for task 53] org.apache.spark.executor.Executor: Exception in task 0.0 in stage 6.0 (TID 53)
org.apache.beam.sdk.util.UserCodeException: org.gbif.pipelines.common.PipelinesException: vocab.com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot construct instance of `org.gbif.vocabulary.model.Tag` (although at least one Creator exists): no String-argument constructor/factory method to deserialize from String value ('deprecated')
 at [Source: (org.apache.hadoop.hdfs.client.HdfsDataInputStream); line: 70, column: 3] (through reference chain: org.gbif.vocabulary.model.export.Export["concepts"]->java.util.ArrayList[1]->org.gbif.vocabulary.model.Concept["tags"]->java.util.ArrayList[0])

I solved it, editing the recently updated EventType.json vocabulary and removing the deprecated from tags:

    "tags" : [ "deprecated" ]                                                                                                                                                         

Probably is better to use the deprecated field.

I'm using a pipelines snapshot version from 19th January (the same as ALA was using).

marcos-lg commented 3 months ago

This looks like a mismatch between the EventType.json vocabulary and the version of the vocabulary-lookup library. There has been changes in that recently and they are still not released. It should work with the latest vocabulary-lookup version(1.0.10-SNAPSHOT). But I wonder what that EventType.json file contains since it shouldn't contain any tag.

vjrj commented 3 months ago

Thanks for the description. We use this one:

 curl -s -o - https://api.gbif-uat.org/v1/vocabularies/EventType/export | grep -s tags | grep dep
    "tags" : [ "deprecated" ]
    "tags" : [ "deprecated" ]
marcos-lg commented 3 months ago

Ah ok, that explains it. It's because I recently deployed to UAT a version of the vocabulary api that also exports the tags (it's not in production yet) and because of that you need to use the latest version of the vocabulary-lookup as I mentioned above.