archivematica / Issues

Issues repository for the Archivematica project
GNU Affero General Public License v3.0
16 stars 1 forks source link

Problem: API "package" endpoint doesn't work with some absolute paths #1709

Open djjuhasz opened 2 months ago

djjuhasz commented 2 months ago

Expected behaviour

Submitting a POST /api/v2beta/package request with a path parameter value that is an absolute path should start processing the transfer at the given path, as per https://www.archivematica.org/en/docs/archivematica-1.16/dev-manual/api/api-reference-archivematica/#package.

Current behaviour

Submitting a POST /api/v2beta/package request with an absolute path seems to work for some paths and not for others — I'm not clear why some paths don't work while others do. I am testing with an Archivematica deployment that has two transfer source locations configured: "/home" and "/transfer_source". Sending a package request using the path /transfer_source/small_bag.zip starts a transfer successfully with that, while a request with path /home/small.zip does not start a transfer.

In both cases the HTTP response status is 202 Accepted and a transfer_id is returned in the body, but in second case the transfer never actually starts.

Both paths exist on the Storage Service server:

Works:

artefactual@amss:/home$ ls -l /transfer_source/small_bag.zip
-rwxr-x--- 1 enduro archivematica 1683 Mar  6 19:12 /transfer_source/small_bag.zip

Doesn't work:

artefactual@amss:/home$ ls -l /home/small.zip
-rw-r--r-- 1 enduro archivematica 1276 Aug 15 22:23 /home/small.zip

Steps to reproduce

Here are the two CURL requests I used for testing:

Works:

  curl -i -X POST \
  -H 'Accept: */*' \
  -H 'Authorization: ApiKey REDACTED:REDACTED' \
  -H 'Content-Type: application/json' \
  --data "{\
       \"path\": \"$(echo -n '/transfer_source/small_bag.zip' | base64 -w 0)\", \
       \"name\": \"small_bag.zip\", \
       \"processing_config\": \"automated\", \
       \"type\": \"zipped bag\" \
      }" \
  https://REDACTED.archivematica.net/api/v2beta/package

Does not work:

curl -i -X POST \
  -H 'Accept: */*' \
  -H 'Authorization: ApiKey REDACTED:REDACTED' \
  -H 'Content-Type: application/json' \
  --data "{\
       \"path\": \"$(echo -n '/home/small.zip' | base64 -w 0)\", \
       \"name\": \"small.zip\", \
       \"processing_config\": \"automated\", \
       \"type\": \"zipfile\" \
      }" \
  https://REDACTED.archivematica.net/api/v2beta/package

Here's a screenshot of the storage location configuration in the Storage Service: image

Archivematica MCPServer Debug log for failed transfer: Archivematica.debug.log

Your environment (version of Archivematica, operating system, other relevant details)

Archivematica version: 1.16.0

OS:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.6 LTS
Release:    20.04
Codename:   focal

For Artefactual use:

Before you close this issue, you must check off the following:

djjuhasz commented 2 months ago

P.S. I can successfully start processing the /home/small.zip transfer using the Archivematica Dashboard.

djjuhasz commented 2 months ago

Similar to #1436

djjuhasz commented 2 months ago

If I use the "/home" transfer source UUID in the package request the transfer starts successfully:

curl -i -X POST \
  -H 'Accept: */*' \
  -H 'Authorization: ApiKey REDACTED:REDACTED' \
  -H 'Content-Type: application/json' \
  --data "{\
       \"path\": \"$(echo -n '749ef452-fbed-4d50-9072-5f98bc01e52e:small.zip' | base64 -w 0)\", \
       \"name\": \"small.zip\", \
       \"processing_config\": \"automated\", \
       \"type\": \"zipfile\" \
      }" \
  https://REDACTED.archivematica.net/api/v2beta/package
replaceafill commented 2 months ago

I see this in your attached log:

WARNING   2024-08-15 22:33:48  archivematica.common:storageService:copy_files:316:  Unable to move files with {'origin_location': '/api/v2/location/32634513-bdfc-47e5-8cba-ee2e73dd9811/', 'files': [{'source': 'home/small.zip', 'destination': '/var/archivematica/sharedDirectory/tmp/tmpyfhgjjiu/small.zip'}], 'pipeline': '/api/v2/pipeline/f8c0d75f-15c3-4152-ac1c-abcf5d8c4b36/'} because 500 Server Error: Internal Server Error for url: https://REDACTED.archivematica.net:8000/api/v2/location/8f8af017-3b89-4ce9-a90b-42d4745a3d0d/

Port 8000 runs the Storage Service. See if you have a Traceback or ERROR in that same time span in your /var/log/archivematica/storage-service/storage_service_debug.log file.

djjuhasz commented 2 months ago

@replaceafill I just tried the failing package again and here is the error in the Storage Service debug log: amss_debug.log

replaceafill commented 2 months ago

@djjuhasz The relevant bit here:

locations.models.StorageException: Rsync failed with status 23: b'sending incremental file list\nrsync: change_dir "/transfer_source/home" failed: No such file or directory (2)\ndelta-transmission disabled for local transfer or --whole-file\ntotal: matches=0  hash_hits=0  false_alarms=0 data=0\n\nsent 20 bytes  received 79 bytes  198.00 bytes/sec\ntotal size is 0  speedup is 0.00\nrsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1205) [sender=3.1.3]\n'

Could it be a problem with your spaces set up? The /transfer_source/home path that the Storage Service is looking for here doesn't seem to match the curl requests you're showing above.

djjuhasz commented 2 months ago

@replaceafill it looks to me like the AM or the AMSS are prepending "/transfer_source" to the "/home/small.zip" from the "path" parameter in the request.

How would I set up the spaces incorrectly to cause this problem? I included a screenshot of the transfer source directories config from the AMSS dashboard in the initial bug report. The "/transfer_source" directory was initially the "default" transfer source, and I changed the default to "/home" after I first encountered this problem, but if this is a problem that still seems like an AMSS bug.

djjuhasz commented 2 months ago

@replaceafill also, I can start the /home/small.zip transfer fine from the AM Dashboard, so the spaces setup works fine in that case.

replaceafill commented 2 months ago

@replaceafill it looks to me like the AM or the AMSS are prepending "/transfer_source" to the "/home/small.zip" from the "path" parameter in the request.

That is correct! We investigated this today and found a couple of problems:

  1. The documentation of the endpoint states:

    A fundamental difference between the package endpoint and others from which a transfer can be initiated is that a storage service transfer location UUID isnt always required. In some cases that might still be ideal.

    I think this should specify that if you pass a path with no transfer source location UUID prepended the MCPServer is going to fetch the default transfer source location. If you have multiple transfer source locations and you want to start a transfer that is not in a default one, you have to pass the UUID.

  2. If the transfer source location UUID is not specified the MCPServer will cache the initial fetch from the Storage Service in a global variable. This is problematic if the user changes the default transfer source location in the Storage Service after it has been cached. The MCPServer process would need to be restarted so the global variable can be reset.