dlcs / image-server-node-cantaloupe

Dockerfile for Cantaloupe image server
MIT License
1 stars 0 forks source link

Issue with KakaduNativeProcessor #6

Closed donaldgray closed 3 months ago

donaldgray commented 1 year ago

Description

When running this image with KDU 8.2.1 we are running into frequent crashes. The image seems more stable locally when running on a single core (via docker-compose)

deploy:
  resources:
    limits:
      cpus: '1'
      memory: 4096M

However, when deploying to AWS ECS, either as a Fargate or EC2 service this is still unstable. Using different combination of S3Source/FileSystemSource, StreamStrategy/CacheStrategy/DownloadStrategy and enabling/disabling chunking makes no difference.

This is not happening with all images, we are encountering this in Wellcome's environment, mostly with images that originated from proquest. E.g.

Note that the above are larger than most other Wellcome JP2s but are by no means large (~8MB). Tile requests seem fine but it could be due to the size.

Errors/Behavour

Cantaloupe logs will often output something similar to:

[qtp283318938-12] INFO  e.i.l.c.r.i.v.ImageResource - Handling GET /iiif/2/5%2Fb12946308_hin-wel-all-00001677_0040.jp2%2Fb12946308_hin-wel-all-00001677_0040.jp2.jp2/full/max/0/default.jpg
[qtp283318938-12] DEBUG e.i.l.c.r.i.v.ImageResource - Request headers: X-Forwarded-Proto: http; X-Forwarded-For: 62.254.125.28; Host: kduspecial.dlcs.io; X-Forwarded-Port: 80; X-Amzn-Trace-Id: Root=1-646cbff7-70123d20199f2b423a0d24bd
[qtp283318938-12] DEBUG e.i.l.c.i.MetaIdentifier - [Raw path component: 5%2Fb12946308_hin-wel-all-00001677_0040.jp2%2Fb12946308_hin-wel-all-00001677_0040.jp2.jp2] -> [decoded: 5/b12946308_hin-wel-all-00001677_0040.jp2/b12946308_hin-wel-all-00001677_0040.jp2.jp2] -> [slashes substituted: 5/b12946308_hin-wel-all-00001677_0040.jp2/b12946308_hin-wel-all-00001677_0040.jp2.jp2]
[qtp283318938-12] DEBUG e.i.l.c.i.Identifier - [Raw path component: 5%2Fb12946308_hin-wel-all-00001677_0040.jp2%2Fb12946308_hin-wel-all-00001677_0040.jp2.jp2] -> [decoded: 5/b12946308_hin-wel-all-00001677_0040.jp2/b12946308_hin-wel-all-00001677_0040.jp2.jp2] -> [slashes substituted: 5/b12946308_hin-wel-all-00001677_0040.jp2/b12946308_hin-wel-all-00001677_0040.jp2.jp2]
[qtp283318938-12] DEBUG e.i.l.c.c.FilesystemCache - getInfo(): not found: /var/cache/cantaloupe/info/bd/41/77/bd41775f8f5f909ca8c04c76ac24772b.json
[qtp283318938-12] DEBUG e.i.l.c.c.FilesystemCache - getSourceImageFile(): /var/cache/cantaloupe/source/bd/41/77/bd41775f8f5f909ca8c04c76ac24772b
[qtp283318938-12] DEBUG e.i.l.c.s.FilesystemSource - Resolved 5/b12946308_hin-wel-all-00001677_0040.jp2/b12946308_hin-wel-all-00001677_0040.jp2.jp2 to /home/cantaloupe/images/5/b12946308_hin-wel-all-00001677_0040.jp2/b12946308_hin-wel-all-00001677_0040.jp2.jp2
[qtp283318938-12] DEBUG e.i.l.c.p.ProcessorFactory - KakaduNativeProcessor selected for format JPEG2000 (AutomaticSelectionStrategy offered KakaduNativeProcessor, OpenJpegProcessor, GrokProcessor)
[qtp283318938-12] DEBUG e.i.l.c.p.ProcessorConnector - File -> FileProcessor connection between FilesystemSource and KakaduNativeProcessor
[qtp283318938-12] DEBUG e.i.l.c.c.FilesystemCache - getInfo(): not found: /var/cache/cantaloupe/info/bd/41/77/bd41775f8f5f909ca8c04c76ac24772b.json
[qtp283318938-12] DEBUG e.i.l.c.c.InfoService - readInfo(): read 5/b12946308_hin-wel-all-00001677_0040.jp2/b12946308_hin-wel-all-00001677_0040.jp2.jp2 from KakaduNativeProcessor in 9 msec
[work-nm-2516274] DEBUG e.i.l.c.c.InfoService - putInObjectCache(): adding info: 5/b12946308_hin-wel-all-00001677_0040.jp2/b12946308_hin-wel-all-00001677_0040.jp2.jp2 (new size: 1)
[work-nm-2516274] DEBUG e.i.l.c.c.InfoCache - putInObjectCache(): adding info: 5/b12946308_hin-wel-all-00001677_0040.jp2/b12946308_hin-wel-all-00001677_0040.jp2.jp2 (new size: 1)
[work-nm-2516274] DEBUG e.i.l.c.c.FilesystemCache - put(): writing 5/b12946308_hin-wel-all-00001677_0040.jp2/b12946308_hin-wel-all-00001677_0040.jp2.jp2 to /var/cache/cantaloupe/info/bd/41/77/bd41775f8f5f909ca8c04c76ac24772b.json_work-nm-2516274.tmp
[work-nm-2516274] DEBUG e.i.l.c.c.FilesystemCache - put(): moving /var/cache/cantaloupe/info/bd/41/77/bd41775f8f5f909ca8c04c76ac24772b.json_work-nm-2516274.tmp to /var/cache/cantaloupe/info/bd/41/77/bd41775f8f5f909ca8c04c76ac24772b.json
[qtp283318938-12] DEBUG e.i.l.c.r.i.v.ImageResource - Base URI assembled from X-Forwarded headers: http://kduspecial.dlcs.io
[qtp283318938-12] DEBUG e.i.l.c.c.FilesystemCache - newDerivativeImageInputStream(): /var/cache/cantaloupe/image/bd/41/77/bd41775f8f5f909ca8c04c76ac24772b_56782f8b936d89dd34b4839ae243ef0a.jpg
[qtp283318938-12] DEBUG e.i.l.c.c.FilesystemCache - newOutputStream(): miss; caching 5/b12946308_hin-wel-all-00001677_0040.jp2/b12946308_hin-wel-all-00001677_0040.jp2.jp2_1:1_scalebypixels:!2612,3827,bicubic_encode:jpg_JPEG_80_interlace_#FFFFFF_8_e95198c48292fa4c8fc0617e850c1f6d
[qtp283318938-12] DEBUG e.i.l.c.r.ImageRepresentation - Writing to the response & derivative cache simultaneously
[qtp283318938-12] DEBUG e.i.l.c.p.c.j.JPEG2000KakaduImageReader - Rendered region 0,0/2612x3826; source 0,0/2612x3826; 0x reduction factor; differential scale 8071693/10000000; PPI 600.0x600.0
[Thread-10] ERROR e.i.l.c.p.c.j.JPEG2000KakaduImageReader - Kakadu Core Error:
System is entering deadlock. This may be a false alarm if you are expecting a
call to `kdu_thread_entity::signal_condition' from a foreign thread, but then
`kdu_thread_entity::wait_for_condition' should have supplied
`expect_foreign_wakeup'=false to avoid the false alarm.
Deadlock detected by thread 2
* Working-waiters state 0x2000000000000004
* Non-waiting workers state 0x0000000000000081; (A,T,I)=(1,0,2)
* Thread 0: <idle>; <Status=--A>; <WWID=0>, conditions...
Waiting for "get_new_synthesized_stripe" (S,F)=(...,Local)
* Thread 1: <idle>; <Status=--A>; <No-WWID>
* Thread 2: <active>; <Status=--A>; <No-WWID>
* Thread 3: <idle>; <Status=--A>; <No-WWID>
Kakadu Core Error:
System is entering deadlock. This may be a false alarm if you are expecting a
call to `kdu_thread_entity::signal_condition' from a foreign thread, but then
`kdu_thread_entity::wait_for_condition' should have supplied
`expect_foreign_wakeup'=false to avoid the false alarm.
Deadlock detected by thread 2
* Working-waiters state 0x2000000000000004
* Non-waiting workers state 0x0000000000000081; (A,T,I)=(1,0,2)
* Thread 0: <idle>; <Status=--A>; <WWID=0>, conditions...
Waiting for "get_new_synthesized_stripe" (S,F)=(...,Local)
* Thread 1: <idle>; <Status=--A>; <No-WWID>
* Thread 2: <active>; <Status=--A>; <No-WWID>
* Thread 3: <idle>; <Status=--A>; <No-WWID>
[qtp283318938-12] ERROR e.i.l.c.p.c.j.JPEG2000KakaduImageReader - Fatal error in the codestream management machinery.
[qtp283318938-12] DEBUG e.i.l.c.p.c.ImageWriterFacade - Writing with edu.illinois.library.cantaloupe.processor.codec.jpeg.TurboJPEGImageWriter
[qtp283318938-12] DEBUG e.i.l.c.r.ImageRepresentation - KakaduNativeProcessor processed in 383 msec: 5/b12946308_hin-wel-all-00001677_0040.jp2/b12946308_hin-wel-all-00001677_0040.jp2.jp2_1:1_scalebypixels:!2612,3827,bicubic_encode:jpg_JPEG_80_interlace_#FFFFFF_8_e95198c48292fa4c8fc0617e850c1f6d
[qtp283318938-12] DEBUG e.i.l.c.c.FilesystemCache$ConcurrentFileOutputStream - close(): closing stream for 5/b12946308_hin-wel-all-00001677_0040.jp2/b12946308_hin-wel-all-00001677_0040.jp2.jp2_1:1_scalebypixels:!2612,3827,bicubic_encode:jpg_JPEG_80_interlace_#FFFFFF_8_e95198c48292fa4c8fc0617e850c1f6d
[qtp283318938-12] DEBUG e.i.l.c.c.FilesystemCache$ConcurrentFileOutputStream - close(): moving /var/cache/cantaloupe/image/bd/41/77/bd41775f8f5f909ca8c04c76ac24772b_56782f8b936d89dd34b4839ae243ef0a.jpg_qtp283318938-12.tmp to /var/cache/cantaloupe/image/bd/41/77/bd41775f8f5f909ca8c04c76ac24772b_56782f8b936d89dd34b4839ae243ef0a.jpg
[qtp283318938-12] DEBUG e.i.l.c.p.c.j.JPEG2000KakaduImageReader - Kakadu Core Warning:
Destroying the memory allocator used to manage core precinct resources without
first releasing all the relevant allocated memory.

which doesn't necessarily happen just before a crash and seemingly doesn't happen on every occasion. The actual error is:

[qtp283318938-17] DEBUG e.i.l.c.p.c.j.JPEG2000KakaduImageReader - Rendered region 0,0/2630x3801; source 0,0/2630x3801; 0x reduction factor; differential scale 7872825/10000000; PPI 600.0x600.0
[qtp283318938-17] DEBUG e.i.l.c.p.c.j.JPEG2000KakaduImageReader - Rendered region 0,0/2630x3801; source 0,0/2630x3801; 0x reduction factor; differential scale 7872825/10000000; PPI 600.0x600.0
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f9ed7742a77, pid=23, tid=80
#
# JRE version: OpenJDK Runtime Environment (11.0.18+10) (build 11.0.18+10-post-Ubuntu-0ubuntu122.04)
# Java VM: OpenJDK 64-Bit Server VM (11.0.18+10-post-Ubuntu-0ubuntu122.04, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# C [libkdu_jni.so+0x347a77] kd_core_local::kd_resolution::advance_bkgnd_parsed_precinct_rows(kdu_core::kdu_thread_env*, bool)+0x1c7
#
# Core dump will be written. Default location: /opt/kakadu/core.23
#
# An error report file with more information is saved as:
# /opt/kakadu/hs_err_pid23.log
#
# If you would like to submit a bug report, please visit:
# https://bugs.launchpad.net/ubuntu/+source/openjdk-lts
#
/opt/app/s3-config.sh: line 11: 23 Aborted (core dumped) java -Dcantaloupe.config=/cantaloupe/cantaloupe.properties -Xmx$MAXHEAP -Xms$INITHEAP -jar /cantaloupe/cantaloupe-$CANTALOUPE_VERSION.jar

# Problematic frame:
# C [libkdu_jni.so+0x347a77] kd_core_local::kd_resolution::advance_bkgnd_parsed_precinct_rows(kdu_core::kdu_thread_env*, bool)+0x1c7

Next Steps

We need to identify:

donaldgray commented 3 months ago

Resolved by upgrading to 8.4.1