Islandora / islandora

Drupal modules for browsing and managing digital repositories.
http://islandora.ca/
GNU General Public License v2.0
154 stars 119 forks source link

[BUG] Endless repeating "Created a version in Fedora for [media name]" #903

Closed alxp closed 1 year ago

alxp commented 1 year ago

Note While I found this bug using ISLE, the apparent behaviour of infinitely reacting to one event seems like a core Islandora problem that should not occur, especially so easily.

What steps does it take to reproduce the issue?

Discovered using ISLE development branch latest commit as of 2022-10-17. Observed on make demo and make starter_dev profiles.

  1. Do a fresh install using make starter_dev of an Isle site.
  2. Go to /node/add/islandora_object and add an object with model Paged Content.
  3. Add another object with model Page and under Relationships set it to be Member Of the node above.
  4. On the Page object, add a media of type File. Upload a JP2 image. (Will test with other files next. - @alxp) Set the Media Use checking both Original File and Service File and save.

Seemingly on paged content + page object combinations.

After adding a media.

Hypercube continually logs entries, Watchdog shows many repeated instances of ""Created a version in Fedora for [media file name]". Fcrepo log continually shows requests for HEAD of the media item. (logs attached.)

logs.zip

User with permission to create Fedora objects.

Expected the site to stop activity once derivatives are generated.

Which version of Islandora are you using?

islandora/islandora: 2.4.2 Islandora-Devops/isle-dc: Development 74968de32f08336afd76155cc747686dce49e51b Oct 17 2022

Any related open or closed issues to this bug report?

Slack discussion:

Screenshots:

image
alxp commented 1 year ago

The behaviour does not occur when uploading a JP2 but only checking "Original File" and not "Service File" in the media.

However something interesting, no Extracted Text" media is created. But when Original FILe and Service File are both selected, despite all the activity, an Extracted Text media is created.

The Hypercube logs contain about 20 instances of the following error in this case:

le-dc-hypercube-1  | 172.20.0.10 - - [18/Oct/2022:04:37:02 +0000] "GET / HTTP/1.1" 500 125 "-" "Apache-HttpClient/4.5.3 (Java/1.8.0_322)" "-"
isle-dc-hypercube-1  | [2022-10-18 04:37:03] app.INFO: Matched route "{route}". {"route":"GET_","route_parameters":{"_controller":"hypercube.controller:get","_route":"GET_"},"request_uri":"http://hypercube:8000/","method":"GET"} []
isle-dc-hypercube-1  | [2022-10-18 04:37:03] app.INFO: Guard authentication successful! {"token":"[object] (Symfony\\Component\\Security\\Guard\\Token\\PostAuthenticationGuardToken: PostAuthenticationGuardToken(user=\"admin\", authenticated=true, roles=\"authenticated, fedoraadmin\"))","authenticator":"Islandora\\Crayfish\\Commons\\Syn\\JwtAuthenticator"} []
isle-dc-hypercube-1  | [2022-10-18 04:37:04] crayfish.cmd_execute_service.ERROR: Process exited with non-zero code. {"exit_code":1,"stderr":"Error in pixReadMemJp2k: function not present\nError in pixReadMem: jp2k: no pix returned\nError during processing.\n"} []
isle-dc-hypercube-1  | 172.20.0.10 - - [18/Oct/2022:04:37:04 +0000] "GET / HTTP/1.1" 500 125 "-" "Apache-HttpClient/4.5.3 (Java/1.8.0_322)" "-"
isle-dc-hypercube-1  | [2022-10-18 04:37:05] app.INFO: Matched route "{route}". {"route":"GET_","route_parameters":{"_controller":"hypercube.controller:get","_route":"GET_"},"request_uri":"http://hypercube:8000/","method":"GET"} []
isle-dc-hypercube-1  | [2022-10-18 04:37:05] app.INFO: Guard authentication successful! {"token":"[object] (Symfony\\Component\\Security\\Guard\\Token\\PostAuthenticationGuardToken: PostAuthenticationGuardToken(user=\"admin\", authenticated=true, roles=\"authenticated, fedoraadmin\"))","authenticator":"Islandora\\Crayfish\\Commons\\Syn\\JwtAuthenticator"} []
isle-dc-hypercube-1  | [2022-10-18 04:37:06] crayfish.cmd_execute_service.ERROR: Process exited with non-zero code. {"exit_code":1,"stderr":"Error in pixReadMemJp2k: function not present\nError in pixReadMem: jp2k: no pix returned\nError during processing.\n"} []
isle-dc-hypercube-1  | 172.20.0.10 - - [18/Oct/2022:04:37:06 +0000] "GET / HTTP/1.1" 500 125 "-" "Apache-HttpClient/4.5.3 (Java/1.8.0_322)" "-"
alxp commented 1 year ago

Uploading a TIFF file as Original File to a page media works as expected with Extracted Text being generated and no ongoing activity in the hypercube, fedora or watchdog logs.

I then followed the steps to test #897 - Add hOCR option to Text Extraction Media Attachment action and IIIF Manifest. This generated the hOCR extracted structured text to a field on the media as expected.

I then went to try to break things again:

I created a page object and added a File Media with a TIFF file , but selected both Original File and Service File.

This resulted in no extracted text media being generated.

Also the Drupal logs became full of a page not found error for a FITS file.

image

As well, the Hypercube log is now emitting ongoing messages about timeout errors and pools being terminated and restarted. Restarting Docker had no effect.

Attaching the logs:

logs-after-tiff-upload-as-service-and-original-file.zip

whikloj commented 1 year ago

My guess is the 20 instances is because the maxRedeliveries in Alpaca is 20. So when the route fails, it rewinds and tries again. I haven't looked at the code but we should be able to catch a specific Exception and halt processing instead of trying again.

rosiel commented 1 year ago

Sounds like there was a bit of a mess here, but is it primarily resolved by https://github.com/Islandora/islandora/pull/920 ?