artefactual-sdps / enduro

A tool to support ingest and automation in digital preservation workflows
https://enduro.readthedocs.io/
Apache License 2.0
4 stars 3 forks source link

Problem: Changing the configured a3m shared directory breaks the a3m workflow #864

Open djjuhasz opened 4 months ago

djjuhasz commented 4 months ago

Describe the bug

Changing the value of the a3m "shareDir" configuration setting in the "enduro.toml" configuration file causes the a3m processing workflow to fail with a fatal error.

To Reproduce

Steps to reproduce the behavior:

  1. Configure Enduro to use a3m for preservation
  2. In the "enduro.toml" configuration file, change the default value of the "a3m.shareDir" setting to a different value
  3. Restart the enduro and a3m workers to load the new configuration
  4. Attempt to process a transfer by uploading it to the MinIO "sips" bucket
  5. The processing workflow fails with the error:
 "message": "error creating temporary directory: stat /home/a3m/.local/share/a3m/share: no such file or directory",
  "source": "GoSDK",

Expected behavior

Changing the configured a3m shared directory location to a valid filesystem path should not break the a3m workflow.

Screenshot: image

Failure message (JSON):

{
  "message": "activity error",
  "source": "GoSDK",
  "stackTrace": "",
  "encodedAttributes": null,
  "cause": {
    "message": "error creating temporary directory: stat /home/a3m/.local/share/a3m/share: no such file or directory",
    "source": "GoSDK",
    "stackTrace": "",
    "encodedAttributes": null,
    "cause": null,
    "applicationFailureInfo": {
      "type": "",
      "nonRetryable": true,
      "details": {
        "payloads": [
          null
        ]
      }
    }
  },
  "activityFailureInfo": {
    "scheduledEventId": "20",
    "startedEventId": "21",
    "identity": "1@enduro-a3m-0@",
    "activityType": {
      "name": "bundle-activity"
    },
    "activityId": "20",
    "retryState": "NonRetryableFailure"
  }
}

Additional context

The a3m share directory path is hardcoded at https://github.com/artefactual-sdps/enduro/blob/main/internal/workflow/processing.go#L321

sevein commented 4 months ago

Can this issue be closed?

djjuhasz commented 4 months ago

@sevein that's a tough question. Changing the shareDir will still break the a3m integration in dev because a3m is still trying to send the AIP back to /home/a3m/.local/share/a3m/share. I couldn't find any way to configure a3m (in the dev env) to send the AIP to a different directory. :(

djjuhasz commented 4 months ago

I'm copying this from https://github.com/artefactual-sdps/enduro/pull/865#issuecomment-1958136587 for better visibility:

Commit https://github.com/artefactual-sdps/enduro/commit/eca89b656e9e4b825acae07b50230aa73cbe7732 fixes a hardcoded a3m shared directory path in Enduro, but the processing workflow is still failing if the path is changed from the default. A3m finds the deposited SIP in the new path, and succesfully creates and AIP, but a3m is still saving the final AIP to the default "/home/a3m/.local/share/a3m/share/" directory instead of the new path. A3m doesn't pass the stored AIP path back to Enduro directly so Enduro is assuming the path of the AIP is "/home/a3m/.local/share/a3m/new_share/completed", which is not correct.

Fixing this completely will require changes to a3m.

djjuhasz commented 4 months ago

I was just reading the a3m documentation and noticed that a3m accepts a shared_directory configuration setting (ref: https://a3m.readthedocs.io/en/latest/settings.html). I'll try setting this value in the Enduro a3m hack config, and see if solves the problem with Enduro not finding the AIP.

djjuhasz commented 3 months ago

I've done some investigation on using the a3m shared_directory setting to change the directory shared by Enduro and a3m for file exchange. So far, this is what I've found:

I set an A3M_SHARED_DIRECTORY environment variable in the hack/kube/overlays/dev-a3m/enduro-a3m.yaml file, but this breaks the a3m tmp_directory, processing_directory, and rejected_directory paths. This is a consequence of https://github.com/artefactual-labs/a3m/blob/main/a3m/settings/common.py#L168, which sets all four directory paths (shared, tmp, processing, and rejected) when shared_directory is not explicitly set, but doesn't set any of the paths if shared_directory is set. I think I'll file an bug ticket in a3m about this behaviour, as it's unexpected and undocumented.

Next I tried setting the environment variable for all four a3m directories:

 env:
        - name: A3M_SHARED_DIRECTORY
          value: "/home/a3m/share/"
        - name: A3M_TEMP_DIRECTORY
          value: "/home/a3m/share/tmp/"
        - name: A3M_PROCESSING_DIRECTORY
          value: "/home/a3m/share/currentlyProcessing/"
        - name: A3M_REJECTED_DIRECTORY
          value: "/home/a3m/share/rejected/"

but this fails in a3m at the verify AIP step:

[a3m]   | =============== JOB
[a3m]   | verify_aip (exit=1; code=success uuid=abf35e91-53ea-4e80-9f10-a9fe1df23164)
[a3m]   | =============== STDOUT
[a3m]   | 
[a3m]   | =============== END STDOUT
[a3m]   | =============== STDERR
[a3m]   | PermissionError(13, 'Permission denied')
[a3m] Error extracting AIP at "/home/a3m/share/currentlyProcessing/ingest/54921778-26ea-46d1-a20e-a6c5ad80f504/small-54921778-26ea-46d1-a20e-a6c5ad80f504.7z"
[a3m] 
[a3m]   | =============== END STDERR
[a3m]   | =============== ARGS
[a3m]   | ['verify_aip', '54921778-26ea-46d1-a20e-a6c5ad80f504', '/home/a3m/share/currentlyProcessing/ingest/54921778-26ea-46d1-a20e-a6c5ad80f504/small-54921778-26ea-46d1-a20e-a6c5ad80f504.7z']
[a3m]   | =============== END ARGS

I don't yet know why verify AIP is failing, but I did notice that https://github.com/artefactual-labs/a3m/blob/main/a3m/server/shared_dirs.py#L12 is creating some extra directories when the a3m server is started, and it is assuming the "processing directory" is named currentlyProcessing which may be a problem if A3M_PROCESSING_DIRECTORY is set to something else.

djjuhasz commented 3 months ago

The verify AIP error message appears to occur at https://github.com/artefactual-labs/a3m/blob/main/a3m/client/clientScripts/verify_aip.py#L193. I'm not clear where in the code the PermissionError(13, 'Permission denied') originates, but my best guess is https://github.com/artefactual-labs/a3m/blob/main/a3m/client/clientScripts/verify_aip.py#L25. It seems like a3m should have sufficient permissions to create a temporary directory for the extracted AIP, so I'm not sure why permissions are denied. :shrug: