archivematica / Issues

Issues repository for the Archivematica project
GNU Affero General Public License v3.0
16 stars 1 forks source link

Problem: "Removed bagged files" reports failure when thumbnails aren't created #651

Closed andrewjbtw closed 4 years ago

andrewjbtw commented 5 years ago

Expected behaviour Choosing not to generate thumbnails shouldn't cause "Remove bagged files" to report a failure.

Current behaviour If you choose not to create thumbnails, the "Remove bagged files" micro-service reports a failure. The stderr for this micro-service indicates the failure is because the "thumbnails" directory does not exist and so can't be removed.

Steps to reproduce Generate any ingest without thumbnails and it will show a failure at "Remove bagged files".

Your environment (version of Archivematica, OS version, etc) Ubuntu 18.04, AM 1.8, AM 1.9


For Artefactual use: Please make sure these steps are taken before moving this issue from Review to Verified in Waffle:

ablwr commented 5 years ago

Thanks for reporting, @andrewjbtw!

I am seeing this in 1.9 but not in 1.9.1 or the latest branch (proto-1.10), so I believe it is resolved. I think we should keep this issue open and make sure to QA it for our next 1.10 release. (Thanks Darren for adding the triagle label!)

ross-spencer commented 5 years ago

I have moved this back into ready to indicate that it looks like it still needs to be addressed.

I can recreate it with the following configuration locally with the latest commit: https://github.com/artefactual/archivematica/commit/42e137fdb716c19fa7e6d5f233bd13a3b0a1d5d6

<processingMCP>
  <preconfiguredChoices>
    <!-- Store DIP -->
    <preconfiguredChoice>
      <appliesTo>5e58066d-e113-4383-b20b-f301ed4d751c</appliesTo>
      <goToChain>4500f34e-f004-4ccf-8720-5c38d0be2254</goToChain>
    </preconfiguredChoice>
    <!-- Select compression level -->
    <preconfiguredChoice>
      <appliesTo>01c651cb-c174-4ba4-b985-1d87a44d6754</appliesTo>
      <goToChain>ecfad581-b007-4612-a0e0-fcc551f4057f</goToChain>
    </preconfiguredChoice>
    <!-- Examine contents -->
    <preconfiguredChoice>
      <appliesTo>accea2bf-ba74-4a3a-bb97-614775c74459</appliesTo>
      <goToChain>e0a39199-c62a-4a2f-98de-e9d1116460a8</goToChain>
    </preconfiguredChoice>
    <!-- Perform file format identification (Submission documentation & metadata) -->
    <preconfiguredChoice>
      <appliesTo>087d27be-c719-47d8-9bbb-9a7d8b609c44</appliesTo>
      <goToChain>4dec164b-79b0-4459-8505-8095af9655b5</goToChain>
    </preconfiguredChoice>
    <!-- Normalize (match 1 for "Do not normalize") -->
    <preconfiguredChoice>
      <appliesTo>cb8e5706-e73f-472f-ad9b-d1236af8095f</appliesTo>
      <goToChain>89cb80dd-0636-464f-930d-57b61e3928b2</goToChain>
    </preconfiguredChoice>
    <!-- Normalize (match 2 for "Do not normalize") -->
    <preconfiguredChoice>
      <appliesTo>7509e7dc-1e1b-4dce-8d21-e130515fce73</appliesTo>
      <goToChain>e8544c5e-9cbb-4b8f-a68b-6d9b4d7f7362</goToChain>
    </preconfiguredChoice>
    <!-- Bind PIDs -->
    <preconfiguredChoice>
      <appliesTo>05357876-a095-4c11-86b5-a7fff01af668</appliesTo>
      <goToChain>fcfea449-158c-452c-a8ad-4ae009c4eaba</goToChain>
    </preconfiguredChoice>
    <!-- Create SIP(s) -->
    <preconfiguredChoice>
      <appliesTo>bb194013-597c-4e4a-8493-b36d190f8717</appliesTo>
      <goToChain>61cfa825-120e-4b17-83e6-51a42b67d969</goToChain>
    </preconfiguredChoice>
    <!-- Delete packages after extraction -->
    <preconfiguredChoice>
      <appliesTo>f19926dd-8fb5-4c79-8ade-c83f61f55b40</appliesTo>
      <goToChain>85b1e45d-8f98-4cae-8336-72f40e12cbef</goToChain>
    </preconfiguredChoice>
    <!-- Transcribe files (OCR) -->
    <preconfiguredChoice>
      <appliesTo>7079be6d-3a25-41e6-a481-cee5f352fe6e</appliesTo>
      <goToChain>1170e555-cd4e-4b2f-a3d6-bfb09e8fcc53</goToChain>
    </preconfiguredChoice>
    <!-- Perform file format identification (Transfer) -->
    <preconfiguredChoice>
      <appliesTo>f09847c2-ee51-429a-9478-a860477f6b8d</appliesTo>
      <goToChain>d97297c7-2b49-4cfe-8c9f-0613d63ed763</goToChain>
    </preconfiguredChoice>
    <!-- Store DIP location -->
    <preconfiguredChoice>
      <appliesTo>cd844b6e-ab3c-4bc6-b34f-7103f88715de</appliesTo>
      <goToChain>/api/v2/location/default/DS/</goToChain>
    </preconfiguredChoice>
    <!-- Generate transfer structure report -->
    <preconfiguredChoice>
      <appliesTo>56eebd45-5600-4768-a8c2-ec0114555a3d</appliesTo>
      <goToChain>df54fec1-dae1-4ea6-8d17-a839ee7ac4a7</goToChain>
    </preconfiguredChoice>
    <!-- Perform policy checks on originals -->
    <preconfiguredChoice>
      <appliesTo>70fc7040-d4fb-4d19-a0e6-792387ca1006</appliesTo>
      <goToChain>3e891cc4-39d2-4989-a001-5107a009a223</goToChain>
    </preconfiguredChoice>
    <!-- Reminder: add metadata if desired -->
    <preconfiguredChoice>
      <appliesTo>eeb23509-57e2-4529-8857-9d62525db048</appliesTo>
      <goToChain>5727faac-88af-40e8-8c10-268644b0142d</goToChain>
    </preconfiguredChoice>
    <!-- Generate thumbnails -->
    <preconfiguredChoice>
      <appliesTo>498f7a6d-1b8c-431a-aa5d-83f14f3c5e65</appliesTo>
      <goToChain>972fce6c-52c8-4c00-99b9-d6814e377974</goToChain>
    </preconfiguredChoice>
    <!-- Store AIP -->
    <preconfiguredChoice>
      <appliesTo>2d32235c-02d4-4686-88a6-96f4d6c7b1c3</appliesTo>
      <goToChain>9efab23c-31dc-4cbd-a39d-bb1665460cbe</goToChain>
    </preconfiguredChoice>
    <!-- Perform policy checks on access derivatives -->
    <preconfiguredChoice>
      <appliesTo>8ce07e94-6130-4987-96f0-2399ad45c5c2</appliesTo>
      <goToChain>76befd52-14c3-44f9-838f-15a4e01624b0</goToChain>
    </preconfiguredChoice>
    <!-- Perform file format identification (Ingest) -->
    <preconfiguredChoice>
      <appliesTo>7a024896-c4f7-4808-a240-44c87c762bc5</appliesTo>
      <goToChain>3c1faec7-7e1e-4cdd-b3bd-e2f05f4baa9b</goToChain>
    </preconfiguredChoice>
    <!-- Perform policy checks on preservation derivatives -->
    <preconfiguredChoice>
      <appliesTo>153c5f41-3cfb-47ba-9150-2dd44ebc27df</appliesTo>
      <goToChain>b7ce05f0-9d94-4b3e-86cc-d4b2c6dba546</goToChain>
    </preconfiguredChoice>
    <!-- Assign UUIDs to directories -->
    <preconfiguredChoice>
      <appliesTo>bd899573-694e-4d33-8c9b-df0af802437d</appliesTo>
      <goToChain>2dc3f487-e4b0-4e07-a4b3-6216ed24ca14</goToChain>
    </preconfiguredChoice>
    <!-- Document empty directories -->
    <preconfiguredChoice>
      <appliesTo>d0dfa5fc-e3c2-4638-9eda-f96eea1070e0</appliesTo>
      <goToChain>65273f18-5b4e-4944-af4f-09be175a88e8</goToChain>
    </preconfiguredChoice>
    <!-- Send transfer to quarantine -->
    <preconfiguredChoice>
      <appliesTo>755b4177-c587-41a7-8c52-015277568302</appliesTo>
      <goToChain>d4404ab1-dc7f-4e9e-b1f8-aa861e766b8e</goToChain>
    </preconfiguredChoice>
    <!-- Extract packages -->
    <preconfiguredChoice>
      <appliesTo>dec97e3c-5598-4b99-b26e-f87a435a6b7f</appliesTo>
      <goToChain>79f1f5af-7694-48a4-b645-e42790bbf870</goToChain>
    </preconfiguredChoice>
    <!-- Upload DIP -->
    <preconfiguredChoice>
      <appliesTo>92879a29-45bf-4f0b-ac43-e64474f0f2f9</appliesTo>
      <goToChain>6eb8ebe7-fab3-4e4c-b9d7-14de17625baa</goToChain>
    </preconfiguredChoice>
  </preconfiguredChoices>
</processingMCP>

But I can also see on https://sandbox.archivematica.org that if I use a similar configuration, and change the compression settings between a compressed AIP, and uncompressed AIP (both with thumbnails turned off) then I can recreate the issue in the latter case (uncompressed AIPs) as per my local config above.

Compressed AIP generation image

Uncompressed AIP generation image

There may be some other subtleties in the processing config settings, but its worth investigating further as it does seem this issue still stands.

I wonder what is left behind if the microservice does fail? Presumably there is a chance of having quite an excess of detritus left hanging around if the logs are correct in suggesting the objects and logs directories are not removed? For reference, the workflow provides three args:

        "63f35161-ba76-4a43-8cfa-c38c6a2d5b2f": {
            "config": {
                "@manager": "linkTaskManagerDirectories",
                "@model": "StandardTaskConfig",
                "arguments": "-R \"%SIPLogsDirectory%\" \"%SIPObjectsDirectory%\" \"%SIPDirectory%thumbnails/\"",
                "execute": "remove_v0.0",
                "filter_file_end": null,
                "filter_file_start": null,
                "filter_subdir": null,
                "stderr_file": null,
                "stdout_file": null
            },
            "description": {
                "en": "Remove bagged files",
                "pt_BR": "Remover pacotes de arquivos",
                "sv": "Ta bort filer som blivit satta i en bag"
            },
            "exit_codes": {
                "0": {
                    "job_status": "Completed successfully",
                    "link_id": "7c44c454-e3cc-43d4-abe0-885f93d693c6"
                }
            },
            "fallback_job_status": "Failed",
            "fallback_link_id": "7c44c454-e3cc-43d4-abe0-885f93d693c6",
            "group": {
                "en": "Prepare AIP",
                "es": "Preparar AIP",
                "fr": "Prรฉparer l'AIP",
                "sv": "Fรถrbered AIP"
            }
        },
andrewjbtw commented 5 years ago

Thanks for following up. I've been remiss at checking back in on this issue, but I am also still seeing this with thumbnails and uncompressed AIPs in 1.9.1. The AIPs themselves seem to be exactly what I expect them to be, though I suppose since I've never had Archivematica create thumbnails, maybe I've never seen any other AIP structure. At a quick glance, the AIPs I see when downloading from the demo site look very much like the AIPs I get from our local production and testing instances.

In 1.3 and 1.4 the thumbnail generation seemed to be tied to service files, so SIPs without service files never got thumbnails, and I never saw errors related to the thumbnails. So this is not something I've investigated deeply before.

sromkey commented 4 years ago

@sevein do you think this will still be an issue in qa/1.x, after the no-ops changes?

evelynPM commented 4 years ago

This issue is not appearing in 1.10.x. I think we can close it.

ross-spencer commented 4 years ago

Hi @evelynPM you need a specific combination of options. Without using the processing configuration I have saved above, I believe you simply need:

  1. Do not normalize.
  2. Do not generate thumbnails.
  3. Create an Uncompressed AIP.
  4. Observe the Prepare AIP set of microservices.

I have left a transfer on the 1.10.1 test server today. This uses the DemoTransferCSV set and you can inspect the configuration and the microservice failing for you to observe the behavior. I can recreate this on CentOS and in our Docker deploy.

NB. for other readers the links to the above services above will likely disappear in time.

image

I had a look on qa/1.x and it seems this is still present, so as we were chatting about this yesterday, I was mis-remembering the impact of the no-op work.

A basic fix that might be enough to satisfy this issue should just see the microservice checking the existence of a path before trying to delete it: rm: cannot remove โ€˜/var/archivematica/sharedDirectory/watchedDirectories/workFlowDecisions/compressionAIPDecisions/thumbnailTest-5cc5375f-0be1-4574-8419-7486a1ff2bb4/thumbnails/โ€™: No such file or directory.

evelynPM commented 4 years ago

You're right, @ross-spencer I was able to re-create the issue on 1.10.x using no normalization / no thumbnails / uncompressed.

tvdbroek commented 4 years ago

We also ran into this issue on Archivematica 1.9.2 with different setting: Create single SIP and continue processing No thumbnails Normalize for preservation and access

tvdbroek commented 4 years ago

We used the following configuration:

<processingMCP>
  <preconfiguredChoices>
    <!-- Store DIP -->
    <preconfiguredChoice>
      <appliesTo>5e58066d-e113-4383-b20b-f301ed4d751c</appliesTo>
      <goToChain>8d29eb3d-a8a8-4347-806e-3d8227ed44a1</goToChain>
    </preconfiguredChoice>
    <!-- Select compression level -->
    <preconfiguredChoice>
      <appliesTo>01c651cb-c174-4ba4-b985-1d87a44d6754</appliesTo>
      <goToChain>414da421-b83f-4648-895f-a34840e3c3f5</goToChain>
    </preconfiguredChoice>
    <!-- Examine contents -->
    <preconfiguredChoice>
      <appliesTo>accea2bf-ba74-4a3a-bb97-614775c74459</appliesTo>
      <goToChain>e0a39199-c62a-4a2f-98de-e9d1116460a8</goToChain>
    </preconfiguredChoice>
    <!-- Remove from quarantine after (days) -->
    <preconfiguredChoice>
      <appliesTo>19adb668-b19a-4fcb-8938-f49d7485eaf3</appliesTo>
      <goToChain>333643b7-122a-4019-8bef-996443f3ecc5</goToChain>
      <delay unitCtime="yes">2419200.0</delay>
    </preconfiguredChoice>
    <!-- Normalize (match 1 for "Normalize for preservation and access") -->
    <preconfiguredChoice>
      <appliesTo>cb8e5706-e73f-472f-ad9b-d1236af8095f</appliesTo>
      <goToChain>b93cecd4-71f2-4e28-bc39-d32fd62c5a94</goToChain>
    </preconfiguredChoice>
    <!-- Bind PIDs -->
    <preconfiguredChoice>
      <appliesTo>05357876-a095-4c11-86b5-a7fff01af668</appliesTo>
      <goToChain>fcfea449-158c-452c-a8ad-4ae009c4eaba</goToChain>
    </preconfiguredChoice>
    <!-- Create SIP(s) -->
    <preconfiguredChoice>
      <appliesTo>bb194013-597c-4e4a-8493-b36d190f8717</appliesTo>
      <goToChain>61cfa825-120e-4b17-83e6-51a42b67d969</goToChain>
    </preconfiguredChoice>
    <!-- Delete packages after extraction -->
    <preconfiguredChoice>
      <appliesTo>f19926dd-8fb5-4c79-8ade-c83f61f55b40</appliesTo>
      <goToChain>85b1e45d-8f98-4cae-8336-72f40e12cbef</goToChain>
    </preconfiguredChoice>
    <!-- Transcribe files (OCR) -->
    <preconfiguredChoice>
      <appliesTo>7079be6d-3a25-41e6-a481-cee5f352fe6e</appliesTo>
      <goToChain>1170e555-cd4e-4b2f-a3d6-bfb09e8fcc53</goToChain>
    </preconfiguredChoice>
    <!-- Store DIP location -->
    <preconfiguredChoice>
      <appliesTo>cd844b6e-ab3c-4bc6-b34f-7103f88715de</appliesTo>
      <goToChain>/api/v2/location/df9f97df-0fd2-47cb-9862-ad6ec78058b9/</goToChain>
    </preconfiguredChoice>
    <!-- Generate transfer structure report -->
    <preconfiguredChoice>
      <appliesTo>56eebd45-5600-4768-a8c2-ec0114555a3d</appliesTo>
      <goToChain>e9eaef1e-c2e0-4e3b-b942-bfb537162795</goToChain>
    </preconfiguredChoice>
    <!-- Perform policy checks on originals -->
    <preconfiguredChoice>
      <appliesTo>70fc7040-d4fb-4d19-a0e6-792387ca1006</appliesTo>
      <goToChain>3e891cc4-39d2-4989-a001-5107a009a223</goToChain>
    </preconfiguredChoice>
    <!-- Reminder: add metadata if desired -->
    <preconfiguredChoice>
      <appliesTo>eeb23509-57e2-4529-8857-9d62525db048</appliesTo>
      <goToChain>5727faac-88af-40e8-8c10-268644b0142d</goToChain>
    </preconfiguredChoice>
    <!-- Store AIP -->
    <preconfiguredChoice>
      <appliesTo>2d32235c-02d4-4686-88a6-96f4d6c7b1c3</appliesTo>
      <goToChain>9efab23c-31dc-4cbd-a39d-bb1665460cbe</goToChain>
    </preconfiguredChoice>
    <!-- Perform policy checks on access derivatives -->
    <preconfiguredChoice>
      <appliesTo>8ce07e94-6130-4987-96f0-2399ad45c5c2</appliesTo>
      <goToChain>76befd52-14c3-44f9-838f-15a4e01624b0</goToChain>
    </preconfiguredChoice>
    <!-- Perform file format identification (Ingest) -->
    <preconfiguredChoice>
      <appliesTo>7a024896-c4f7-4808-a240-44c87c762bc5</appliesTo>
      <goToChain>3c1faec7-7e1e-4cdd-b3bd-e2f05f4baa9b</goToChain>
    </preconfiguredChoice>
    <!-- Perform policy checks on preservation derivatives -->
    <preconfiguredChoice>
      <appliesTo>153c5f41-3cfb-47ba-9150-2dd44ebc27df</appliesTo>
      <goToChain>b7ce05f0-9d94-4b3e-86cc-d4b2c6dba546</goToChain>
    </preconfiguredChoice>
    <!-- Assign UUIDs to directories -->
    <preconfiguredChoice>
      <appliesTo>bd899573-694e-4d33-8c9b-df0af802437d</appliesTo>
      <goToChain>2dc3f487-e4b0-4e07-a4b3-6216ed24ca14</goToChain>
    </preconfiguredChoice>
    <!-- Document empty directories -->
    <preconfiguredChoice>
      <appliesTo>d0dfa5fc-e3c2-4638-9eda-f96eea1070e0</appliesTo>
      <goToChain>29881c21-3548-454a-9637-ebc5fd46aee0</goToChain>
    </preconfiguredChoice>
    <!-- Send transfer to quarantine -->
    <preconfiguredChoice>
      <appliesTo>755b4177-c587-41a7-8c52-015277568302</appliesTo>
      <goToChain>d4404ab1-dc7f-4e9e-b1f8-aa861e766b8e</goToChain>
    </preconfiguredChoice>
    <!-- Extract packages -->
    <preconfiguredChoice>
      <appliesTo>dec97e3c-5598-4b99-b26e-f87a435a6b7f</appliesTo>
      <goToChain>01d80b27-4ad1-4bd1-8f8d-f819f18bf685</goToChain>
    </preconfiguredChoice>
    <!-- Approve normalization -->
    <preconfiguredChoice>
      <appliesTo>de909a42-c5b5-46e1-9985-c031b50e9d30</appliesTo>
      <goToChain>1e0df175-d56d-450d-8bee-7df1dc7ae815</goToChain>
    </preconfiguredChoice>
    <!-- Upload DIP -->
    <preconfiguredChoice>
      <appliesTo>92879a29-45bf-4f0b-ac43-e64474f0f2f9</appliesTo>
      <goToChain>6eb8ebe7-fab3-4e4c-b9d7-14de17625baa</goToChain>
    </preconfiguredChoice>
  </preconfiguredChoices>
</processingMCP>
replaceafill commented 4 years ago

There are currently two jobs for removing temporary bag directories when an AIP is being prepared to be stored:

Captura de pantalla de 2019-12-17 09-19-55

Notice the job titles are slightly different.

The problem with remove_0.0 is that is based on the rm command and it fails when a target argument doesn't exist (like when thumbnails are not generated but the thumbnails directory is passed to the job). So, this draft PR replaces it in the first case with the Python based removeDirectories which will just print a warning instead of failing the job. The job titles (descriptions in the workflow) have been synchronized also.

But I'm wondering if we should merge both jobs into a single one. I think they're intended to be doing the same task (remove temporary bag directories) and their arguments are very similar but not identical:

@sromkey @sallain @ross-spencer @evelynPM any thoughts?

sallain commented 4 years ago

I have no concerns about combining the jobs into a single one if they're accomplishing the same thing - I'll just loop in @sevein in case there's an architectural reason not to.

sevein commented 4 years ago

To be honest I had no idea, but what you're suggesting Douglas makes total sense!

replaceafill commented 4 years ago

My idea didn't get far :sweat_smile: There's a reason for the arguments to be different: if I try to remove the \"%SIPDirectory%%SIPName%-%SIPUUID%\" in an uncompressed AIP I am removing the AIP directory itself!

Sorry for the noise :blush:

sallain commented 4 years ago

@replaceafill :rofl:

ablwr commented 4 years ago

Tested in qa/1.x (last commit: https://github.com/artefactual/archivematica/commit/6d84db86d7d8fa21cea6f79f4b2ef1ec4c9666ef). Glad to see this one fixed because it's come up a few times!