gbif / pipelines

Pipelines for data processing (GBIF and LivingAtlases)
Apache License 2.0
40 stars 28 forks source link

Stuck datasets in pipelines CLI #952

Closed timrobertson100 closed 8 months ago

timrobertson100 commented 10 months ago

This issue captures diagnostics for an incident of stuck datasets in pipelines.

2000+ datasets were submitted for reprocessing through the API around 15:00. Overnight the ingestion monitor backed up to over 4000 datasets, with many showing a running VERBATIM_TO_IDENTIFIER.

Using one of the examples, from the logs:

06:57:58 UTC prodcrawler3-vh ~/logs $ cat pipelines-occurrence-identifier.log | grep "09-18 15:" | grep "1412a9e6-9028-4d00-8124-3eab48a7ff8e" | grep "ERROR"

ERROR [09-18 15:04:54,480+0000] [pipelines_occurrence_identifier-3] 1412a9e6-9028-4d00-8124-3eab48a7ff8e 8 VERBATIM_TO_IDENTIFIER org.gbif.pipelines.tasks.PipelinesCallback: Dataset is in the queue, please check the pipeline-ingestion monitoring tool - 1412a9e6-9028-4d00-8124-3eab48a7ff8e

ERROR [09-18 15:04:54,484+0000] [pipelines_occurrence_identifier-3] 1412a9e6-9028-4d00-8124-3eab48a7ff8e 8 VERBATIM_TO_IDENTIFIER org.gbif.pipelines.tasks.PipelinesCallback: Couldn't track pipeline step for message {"datasetUuid":"1412a9e6-9028-4d00-8124-3eab48a7ff8e","attempt":8,"interpretTypes":["TEMPORAL","LOCATION","GRSCICOLL","MULTIMEDIA","BASIC","TAXONOMY","IMAGE","IDENTIFIER_ABSENT","AMPLIFICATION","CLUSTERING","OCCURRENCE","VERBATIM","AUDUBON","MEASUREMENT_OR_FACT","LOCATION_FEATURE","METADATA"],"pipelineSteps":["FRAGMENTER","HDFS_VIEW","INTERPRETED_TO_INDEX","DWCA_TO_VERBATIM","VERBATIM_TO_IDENTIFIER","VERBATIM_TO_INTERPRETED"],"runner":"STANDALONE","endpointType":"DWC_ARCHIVE","extraPath":null,"validationResult":{"tripletValid":false,"occurrenceIdValid":true,"useExtendedRecordId":null,"numberOfRecords":20702,"numberOfEventRecords":null},"resetPrefix":null,"executionId":3266487,"datasetType":"OCCURRENCE"}

The pipelines-occurrence-identifier was restarted

timrobertson100 commented 10 months ago

The ingestion monitor "pills" suggests the other stages ran, even though the VERBATIM_TO_IDENTIFIER had error. This seems suspicious or perhaps a misleading UI

image

The full logs suggest it did continue (see Next message has been sent below):

INFO  [09-18 14:57:28,970+0000] [pipelines_occurrence_identifier-1] 1412a9e6-9028-4d00-8124-3eab48a7ff8e 8 VERBATIM_TO_IDENTIFIER org.gbif.pipelines.tasks.PipelinesCallback: Message handler began - {"datasetUuid":"1412a9e6-9028-4d00-8124-3eab48a7ff8e","attempt":8,"interpretTypes":["TEMPORAL","LOCATION","GRSCICOLL","MULTIMEDIA","BASIC","TAXONOMY","IMAGE","IDENTIFIER_ABSENT","AMPLIFICATION","CLUSTERING","OCCURRENCE","VERBATIM","AUDUBON","MEASUREMENT_OR_FACT","LOCATION_FEATURE","METADATA"],"pipelineSteps":["FRAGMENTER","HDFS_VIEW","INTERPRETED_TO_INDEX","DWCA_TO_VERBATIM","VERBATIM_TO_IDENTIFIER","VERBATIM_TO_INTERPRETED"],"runner":"STANDALONE","endpointType":"DWC_ARCHIVE","extraPath":null,"validationResult":{"tripletValid":false,"occurrenceIdValid":true,"useExtendedRecordId":null,"numberOfRecords":20702,"numberOfEventRecords":null},"resetPrefix":null,"executionId":3266487,"datasetType":"OCCURRENCE"}
INFO  [09-18 14:57:28,972+0000] [pipelines_occurrence_identifier-1] 1412a9e6-9028-4d00-8124-3eab48a7ff8e 8 VERBATIM_TO_IDENTIFIER org.gbif.pipelines.tasks.PipelinesCallback: Handler has been started, datasetKey - 1412a9e6-9028-4d00-8124-3eab48a7ff8e
INFO  [09-18 14:57:28,993+0000] [pipelines_occurrence_identifier-1] 1412a9e6-9028-4d00-8124-3eab48a7ff8e 8 VERBATIM_TO_IDENTIFIER org.gbif.pipelines.tasks.occurrences.identifier.IdentifierCallback: Start the process. Message - {"datasetUuid":"1412a9e6-9028-4d00-8124-3eab48a7ff8e","attempt":8,"interpretTypes":["TEMPORAL","LOCATION","GRSCICOLL","MULTIMEDIA","BASIC","TAXONOMY","IMAGE","IDENTIFIER_ABSENT","AMPLIFICATION","CLUSTERING","OCCURRENCE","VERBATIM","AUDUBON","MEASUREMENT_OR_FACT","LOCATION_FEATURE","METADATA"],"pipelineSteps":["FRAGMENTER","HDFS_VIEW","INTERPRETED_TO_INDEX","DWCA_TO_VERBATIM","VERBATIM_TO_IDENTIFIER","VERBATIM_TO_INTERPRETED"],"runner":"STANDALONE","endpointType":"DWC_ARCHIVE","extraPath":null,"validationResult":{"tripletValid":false,"occurrenceIdValid":true,"useExtendedRecordId":null,"numberOfRecords":20702,"numberOfEventRecords":null},"resetPrefix":null,"executionId":3266487,"datasetType":"OCCURRENCE"}
INFO  [09-18 14:57:28,994+0000] [pipelines_occurrence_identifier-1] 1412a9e6-9028-4d00-8124-3eab48a7ff8e 8 VERBATIM_TO_IDENTIFIER org.gbif.pipelines.common.interpretation.RecordCountReader: Getting records number from the file - hdfs://ha-nn/data/ingest/1412a9e6-9028-4d00-8124-3eab48a7ff8e/8/archive-to-verbatim.yml
INFO  [09-18 14:57:29,000+0000] [pipelines_occurrence_identifier-1] 1412a9e6-9028-4d00-8124-3eab48a7ff8e 8 VERBATIM_TO_IDENTIFIER org.gbif.pipelines.common.process.ProcessRunnerBuilder: Command - sudo -u hdfs spark2-submit --conf spark.metrics.conf=/home/crap/config/metrics.properties --conf "spark.driver.extraClassPath=/home/crap/lib/logstash-gelf.jar" --driver-java-options "-Dlog4j.configuration=file:/home/crap/config/log4j-pipelines.properties" --queue root.pipelines --name=VERBATIM_TO_IDENTIFIER_1412a9e6-9028-4d00-8124-3eab48a7ff8e_8 --conf spark.default.parallelism=8 --conf spark.executor.memoryOverhead=4096 --conf spark.dynamicAllocation.enabled=false --conf spark.yarn.am.waitTime=360s --class org.gbif.pipelines.ingest.pipelines.VerbatimToIdentifierPipeline --master yarn --deploy-mode cluster --executor-memory 4G --executor-cores 4 --num-executors 1 --driver-memory 1G hdfs://ha-nn/pipelines/jars/ingest-gbif.jar --datasetId=1412a9e6-9028-4d00-8124-3eab48a7ff8e --attempt=8 --interpretationTypes=TEMPORAL,LOCATION,GRSCICOLL,MULTIMEDIA,BASIC,TAXONOMY,IMAGE,IDENTIFIER_ABSENT,AMPLIFICATION,CLUSTERING,OCCURRENCE,VERBATIM,AUDUBON,MEASUREMENT_OR_FACT,LOCATION_FEATURE,METADATA --runner=SparkRunner --targetPath=hdfs://ha-nn/data/ingest --metaFileName=verbatim-to-identifier.yml --inputPath=hdfs://ha-nn/data/ingest/1412a9e6-9028-4d00-8124-3eab48a7ff8e/8/verbatim.avro --avroCompressionType=snappy --avroSyncInterval=2097152 --hdfsSiteConfig=/home/crap/config/hdfs-site.xml --coreSiteConfig=/home/crap/config/core-site.xml --properties=hdfs://ha-nn/pipelines/jars/pipelines.yaml --experiments=use_deprecated_read --tripletValid=false --occurrenceIdValid=true
INFO  [09-18 14:58:08,928+0000] [pipelines_occurrence_identifier-1] 1412a9e6-9028-4d00-8124-3eab48a7ff8e 8 VERBATIM_TO_IDENTIFIER org.gbif.pipelines.tasks.occurrences.identifier.IdentifierCallback: Process has been finished with exit value - 0
ERROR [09-18 15:04:54,480+0000] [pipelines_occurrence_identifier-3] 1412a9e6-9028-4d00-8124-3eab48a7ff8e 8 VERBATIM_TO_IDENTIFIER org.gbif.pipelines.tasks.PipelinesCallback: Dataset is in the queue, please check the pipeline-ingestion monitoring tool - 1412a9e6-9028-4d00-8124-3eab48a7ff8e
ERROR [09-18 15:04:54,484+0000] [pipelines_occurrence_identifier-3] 1412a9e6-9028-4d00-8124-3eab48a7ff8e 8 VERBATIM_TO_IDENTIFIER org.gbif.pipelines.tasks.PipelinesCallback: Couldn't track pipeline step for message {"datasetUuid":"1412a9e6-9028-4d00-8124-3eab48a7ff8e","attempt":8,"interpretTypes":["TEMPORAL","LOCATION","GRSCICOLL","MULTIMEDIA","BASIC","TAXONOMY","IMAGE","IDENTIFIER_ABSENT","AMPLIFICATION","CLUSTERING","OCCURRENCE","VERBATIM","AUDUBON","MEASUREMENT_OR_FACT","LOCATION_FEATURE","METADATA"],"pipelineSteps":["FRAGMENTER","HDFS_VIEW","INTERPRETED_TO_INDEX","DWCA_TO_VERBATIM","VERBATIM_TO_IDENTIFIER","VERBATIM_TO_INTERPRETED"],"runner":"STANDALONE","endpointType":"DWC_ARCHIVE","extraPath":null,"validationResult":{"tripletValid":false,"occurrenceIdValid":true,"useExtendedRecordId":null,"numberOfRecords":20702,"numberOfEventRecords":null},"resetPrefix":null,"executionId":3266487,"datasetType":"OCCURRENCE"}
INFO  [09-18 15:04:54,494+0000] [pipelines_occurrence_identifier-3] 1412a9e6-9028-4d00-8124-3eab48a7ff8e 8 VERBATIM_TO_IDENTIFIER org.gbif.pipelines.tasks.PipelinesCallback: Message handler began - {"datasetUuid":"1412a9e6-9028-4d00-8124-3eab48a7ff8e","attempt":8,"interpretTypes":["TEMPORAL","LOCATION","GRSCICOLL","MULTIMEDIA","BASIC","TAXONOMY","IMAGE","IDENTIFIER_ABSENT","AMPLIFICATION","CLUSTERING","OCCURRENCE","VERBATIM","AUDUBON","MEASUREMENT_OR_FACT","LOCATION_FEATURE","METADATA"],"pipelineSteps":["FRAGMENTER","HDFS_VIEW","INTERPRETED_TO_INDEX","DWCA_TO_VERBATIM","VERBATIM_TO_IDENTIFIER","VERBATIM_TO_INTERPRETED"],"runner":"STANDALONE","endpointType":"DWC_ARCHIVE","extraPath":null,"validationResult":{"tripletValid":false,"occurrenceIdValid":true,"useExtendedRecordId":null,"numberOfRecords":20702,"numberOfEventRecords":null},"resetPrefix":null,"executionId":3266487,"datasetType":"OCCURRENCE"}
INFO  [09-18 15:04:54,537+0000] [pipelines_occurrence_identifier-3] 1412a9e6-9028-4d00-8124-3eab48a7ff8e 8 VERBATIM_TO_IDENTIFIER org.gbif.pipelines.tasks.PipelinesCallback: Handler has been started, datasetKey - 1412a9e6-9028-4d00-8124-3eab48a7ff8e
INFO  [09-18 15:04:54,707+0000] [pipelines_occurrence_identifier-3] 1412a9e6-9028-4d00-8124-3eab48a7ff8e 8 VERBATIM_TO_IDENTIFIER org.gbif.pipelines.tasks.occurrences.identifier.IdentifierCallback: Start the process. Message - {"datasetUuid":"1412a9e6-9028-4d00-8124-3eab48a7ff8e","attempt":8,"interpretTypes":["TEMPORAL","LOCATION","GRSCICOLL","MULTIMEDIA","BASIC","TAXONOMY","IMAGE","IDENTIFIER_ABSENT","AMPLIFICATION","CLUSTERING","OCCURRENCE","VERBATIM","AUDUBON","MEASUREMENT_OR_FACT","LOCATION_FEATURE","METADATA"],"pipelineSteps":["FRAGMENTER","HDFS_VIEW","INTERPRETED_TO_INDEX","DWCA_TO_VERBATIM","VERBATIM_TO_IDENTIFIER","VERBATIM_TO_INTERPRETED"],"runner":"STANDALONE","endpointType":"DWC_ARCHIVE","extraPath":null,"validationResult":{"tripletValid":false,"occurrenceIdValid":true,"useExtendedRecordId":null,"numberOfRecords":20702,"numberOfEventRecords":null},"resetPrefix":null,"executionId":3266487,"datasetType":"OCCURRENCE"}
INFO  [09-18 15:04:54,714+0000] [pipelines_occurrence_identifier-3] 1412a9e6-9028-4d00-8124-3eab48a7ff8e 8 VERBATIM_TO_IDENTIFIER org.gbif.pipelines.common.interpretation.RecordCountReader: Getting records number from the file - hdfs://ha-nn/data/ingest/1412a9e6-9028-4d00-8124-3eab48a7ff8e/8/archive-to-verbatim.yml
INFO  [09-18 15:04:56,091+0000] [pipelines_occurrence_identifier-3] 1412a9e6-9028-4d00-8124-3eab48a7ff8e 8 VERBATIM_TO_IDENTIFIER org.gbif.pipelines.common.process.ProcessRunnerBuilder: Command - sudo -u hdfs spark2-submit --conf spark.metrics.conf=/home/crap/config/metrics.properties --conf "spark.driver.extraClassPath=/home/crap/lib/logstash-gelf.jar" --driver-java-options "-Dlog4j.configuration=file:/home/crap/config/log4j-pipelines.properties" --queue root.pipelines --name=VERBATIM_TO_IDENTIFIER_1412a9e6-9028-4d00-8124-3eab48a7ff8e_8 --conf spark.default.parallelism=8 --conf spark.executor.memoryOverhead=4096 --conf spark.dynamicAllocation.enabled=false --conf spark.yarn.am.waitTime=360s --class org.gbif.pipelines.ingest.pipelines.VerbatimToIdentifierPipeline --master yarn --deploy-mode cluster --executor-memory 4G --executor-cores 4 --num-executors 1 --driver-memory 1G hdfs://ha-nn/pipelines/jars/ingest-gbif.jar --datasetId=1412a9e6-9028-4d00-8124-3eab48a7ff8e --attempt=8 --interpretationTypes=TEMPORAL,LOCATION,GRSCICOLL,MULTIMEDIA,BASIC,TAXONOMY,IMAGE,IDENTIFIER_ABSENT,AMPLIFICATION,CLUSTERING,OCCURRENCE,VERBATIM,AUDUBON,MEASUREMENT_OR_FACT,LOCATION_FEATURE,METADATA --runner=SparkRunner --targetPath=hdfs://ha-nn/data/ingest --metaFileName=verbatim-to-identifier.yml --inputPath=hdfs://ha-nn/data/ingest/1412a9e6-9028-4d00-8124-3eab48a7ff8e/8/verbatim.avro --avroCompressionType=snappy --avroSyncInterval=2097152 --hdfsSiteConfig=/home/crap/config/hdfs-site.xml --coreSiteConfig=/home/crap/config/core-site.xml --properties=hdfs://ha-nn/pipelines/jars/pipelines.yaml --experiments=use_deprecated_read --tripletValid=false --occurrenceIdValid=true
INFO  [09-18 15:05:47,336+0000] [pipelines_occurrence_identifier-3] 1412a9e6-9028-4d00-8124-3eab48a7ff8e 8 VERBATIM_TO_IDENTIFIER org.gbif.pipelines.tasks.occurrences.identifier.IdentifierCallback: Process has been finished with exit value - 0
INFO  [09-18 15:05:47,377+0000] [pipelines_occurrence_identifier-3] 1412a9e6-9028-4d00-8124-3eab48a7ff8e 8 VERBATIM_TO_IDENTIFIER org.gbif.pipelines.tasks.occurrences.identifier.validation.PostprocessValidation: Getting records number from the file - hdfs://ha-nn/data/ingest/1412a9e6-9028-4d00-8124-3eab48a7ff8e/8/verbatim-to-identifier.yml
INFO  [09-18 15:05:47,783+0000] [pipelines_occurrence_identifier-3] 1412a9e6-9028-4d00-8124-3eab48a7ff8e 8 VERBATIM_TO_IDENTIFIER org.gbif.pipelines.tasks.occurrences.identifier.IdentifierCallback: No identifier issues
INFO  [09-18 15:05:47,791+0000] [pipelines_occurrence_identifier-3] 1412a9e6-9028-4d00-8124-3eab48a7ff8e 8 VERBATIM_TO_IDENTIFIER org.gbif.pipelines.tasks.PipelinesCallback: Handler has been finished, datasetKey - 1412a9e6-9028-4d00-8124-3eab48a7ff8e
INFO  [09-18 15:05:47,797+0000] [pipelines_occurrence_identifier-3] 1412a9e6-9028-4d00-8124-3eab48a7ff8e 8 VERBATIM_TO_IDENTIFIER org.gbif.pipelines.tasks.PipelinesCallback: Next message has been sent - PipelinesVerbatimMessage:{"datasetUuid":"1412a9e6-9028-4d00-8124-3eab48a7ff8e","attempt":8,"interpretTypes":["TEMPORAL","LOCATION","GRSCICOLL","MULTIMEDIA","BASIC","TAXONOMY","IMAGE","IDENTIFIER_ABSENT","AMPLIFICATION","CLUSTERING","OCCURRENCE","VERBATIM","AUDUBON","MEASUREMENT_OR_FACT","LOCATION_FEATURE","METADATA"],"pipelineSteps":["FRAGMENTER","HDFS_VIEW","INTERPRETED_TO_INDEX","DWCA_TO_VERBATIM","VERBATIM_TO_INTERPRETED"],"runner":"STANDALONE","endpointType":"DWC_ARCHIVE","extraPath":null,"validationResult":{"tripletValid":false,"occurrenceIdValid":true,"useExtendedRecordId":null,"numberOfRecords":20702,"numberOfEventRecords":null},"resetPrefix":null,"executionId":3266487,"datasetType":"OCCURRENCE"}
INFO  [09-18 15:05:47,853+0000] [pipelines_occurrence_identifier-3]    org.gbif.pipelines.tasks.PipelinesCallback: Message handler ended - {"datasetUuid":"1412a9e6-9028-4d00-8124-3eab48a7ff8e","attempt":8,"interpretTypes":["TEMPORAL","LOCATION","GRSCICOLL","MULTIMEDIA","BASIC","TAXONOMY","IMAGE","IDENTIFIER_ABSENT","AMPLIFICATION","CLUSTERING","OCCURRENCE","VERBATIM","AUDUBON","MEASUREMENT_OR_FACT","LOCATION_FEATURE","METADATA"],"pipelineSteps":["FRAGMENTER","HDFS_VIEW","INTERPRETED_TO_INDEX","DWCA_TO_VERBATIM","VERBATIM_TO_IDENTIFIER","VERBATIM_TO_INTERPRETED"],"runner":"STANDALONE","endpointType":"DWC_ARCHIVE","extraPath":null,"validationResult":{"tripletValid":false,"occurrenceIdValid":true,"useExtendedRecordId":null,"numberOfRecords":20702,"numberOfEventRecords":null},"resetPrefix":null,"executionId":3266487,"datasetType":"OCCURRENCE"}

It might be noteworthy that this ran twice and on different threads.

fmendezh commented 10 months ago

https://github.com/gbif/pipelines/blob/dev/gbif/coordinator/tasks/src/main/java/org/gbif/pipelines/tasks/occurrences/identifier/IdentifierCallback.java#L79 has a potential issue, an identifiers stage can work but if the validation result/report fails an exception is thrown and it's state can become inconsistent, according to the logs this can happen when contacting the occurrence web service to get the current amount of records of a dataset:

ERROR [09-18 16:41:31,266+0000] [pipelines_occurrence_identifier-6] 87ff22b7-5a6d-4304-baf0-02000c92f8c5 1 VERBATIM_TO_IDENTIFIER org.gbif.pipelines.tasks.occurrences.identifier.IdentifierCallback: Read timed out
java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
    at java.net.SocketInputStream.read(SocketInputStream.java:171)
    at java.net.SocketInputStream.read(SocketInputStream.java:141)
    at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
    at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
    at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
    at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
    at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
    at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
    at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
    at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
    at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
    at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
    at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
    at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
    at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
    at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
    at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
    at org.gbif.pipelines.common.GbifApi.executeGet(GbifApi.java:77)
    at org.gbif.pipelines.common.GbifApi.getIndexSize(GbifApi.java:33)
    at org.gbif.pipelines.tasks.occurrences.identifier.validation.PostprocessValidation.getApiRecords(PostprocessValidation.java:143)
    at org.gbif.pipelines.tasks.occurrences.identifier.validation.PostprocessValidation.validateThreshold(PostprocessValidation.java:73)
    at org.gbif.pipelines.tasks.occurrences.identifier.validation.PostprocessValidation.validate(PostprocessValidation.java:37)
    at org.gbif.pipelines.tasks.occurrences.identifier.IdentifierCallback.lambda$createRunnable$0(IdentifierCallback.java:109)
    at org.gbif.pipelines.tasks.PipelinesCallback.handleMessage(PipelinesCallback.java:159)
    at org.gbif.pipelines.tasks.occurrences.identifier.IdentifierCallback.handleMessage(IdentifierCallback.java:54)
    at org.gbif.pipelines.tasks.occurrences.identifier.IdentifierCallback.handleMessage(IdentifierCallback.java:30)
    at org.gbif.common.messaging.MessageConsumer.handleCallback(MessageConsumer.java:129)
    at org.gbif.common.messaging.MessageConsumer.handleDelivery(MessageConsumer.java:82)
    at com.rabbitmq.client.impl.ConsumerDispatcher$5.run(ConsumerDispatcher.java:149)
    at com.rabbitmq.client.impl.ConsumerWorkService$WorkPoolRunnable.run(ConsumerWorkService.java:104)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

There are two improvements that can be applied here: make Rest clients more reliable (retries) and reviewing how the workflows state is updated later in the process

muttcg commented 10 months ago

I have added Retry api, we use it almost everywhere except that place

muttcg commented 8 months ago

Reopen if appears again