artefactual-sdps / enduro

A tool to support ingest and automation in digital preservation workflows
https://enduro.readthedocs.io/
Apache License 2.0
4 stars 3 forks source link

Problem: download activity fails with a JSON decoding error #826

Open djjuhasz opened 9 months ago

djjuhasz commented 9 months ago

Describe the bug

I'm trying to process a transfer with AM in my local Tilt dev environment, but the download activity is returning a JSON decoding error and the workflow fails.

To Reproduce

Steps to reproduce the behavior:

  1. Upload a transfer via the MinIO UI
  2. The workflow retries the download activity several times, then fails with the error:
{
  "message": "activity error",
  "source": "GoSDK",
  "stackTrace": "",
  "encodedAttributes": null,
  "cause": {
    "message": "unable to decode the activity function input payload with error: payload item 0: unable to decode: json: cannot unmarshal object into Go value of type string for function name: download-activity",
    "source": "GoSDK",
    "stackTrace": "",
    "encodedAttributes": null,
    "cause": {
      "message": "payload item 0: unable to decode: json: cannot unmarshal object into Go value of type string",
      "source": "GoSDK",
      "stackTrace": "",
      "encodedAttributes": null,
      "cause": {
        "message": "unable to decode: json: cannot unmarshal object into Go value of type string",
        "source": "GoSDK",
        "stackTrace": "",
        "encodedAttributes": null,
        "cause": {
          "message": "unable to decode",
          "source": "GoSDK",
          "stackTrace": "",
          "encodedAttributes": null,
          "cause": null,
          "applicationFailureInfo": {
            "type": "",
            "nonRetryable": false,
            "details": null
          }
        },
        "applicationFailureInfo": {
          "type": "wrapError",
          "nonRetryable": false,
          "details": null
        }
      },
      "applicationFailureInfo": {
        "type": "wrapError",
        "nonRetryable": false,
        "details": null
      }
    },
    "applicationFailureInfo": {
      "type": "wrapError",
      "nonRetryable": false,
      "details": null
    }
  },
  "activityFailureInfo": {
    "scheduledEventId": "14",
    "startedEventId": "15",
    "identity": "1@enduro-am-0@",
    "activityType": {
      "name": "download-activity"
    },
    "activityId": "14",
    "retryState": "MaximumAttemptsReached"
  }
}

Expected behavior

The transfer should download and processing should continue.

Screenshots

image

Additional context

I was experiencing this error intermittently before going on Holidays, but restarting the Enduro containers with Tilt usually resolved the problem. Today the error is happening consistently, and I can't get past it.

djjuhasz commented 9 months ago

I just retried with the same transfer, and processing completed successfully this time. :shrug:

sevein commented 9 months ago

We changed the download activity return type from string to struct (backward incompatible) in mid December, right? Given the error "cannot unmarshal object into Go value of type string" I wonder if somehow you were running and old version of the "enduro" binary with the version of the workflow function that wasn't expecting a return value at all. Not sure how that could happen though. Something I'd confirm is that when you restart all enduro services that you're seeing the new pods deployed and the old ones being fully destroyed, e.g. enduro-b6b57f7f5-8h5vj is fully replaced and there are no other enduro pods.

djjuhasz commented 9 months ago

@sevein ah, good thinking. :detective: I'll try deleting all the k8s pods and rebuilding from scratch.

sevein commented 7 months ago

@djjuhasz, have you seen this error again?

djjuhasz commented 7 months ago

@sevein I haven't tested the Enduro + AM integration in a while.