JuliaIO / ImageMagick.jl

Thin Wrapper for the library ImageMagick
Other
53 stars 37 forks source link

High memory use without release #205

Open mihalybaci opened 3 years ago

mihalybaci commented 3 years ago

I am trying to write a bulk image renaming function with outputs based on JPEG EXIF entries, but I am running into a memory issue. Here is the crux of the problem:

using ImageMagick

# Starting memory reported by `top`: 2941 MiB
for i = 1:15
    field_info = magickinfo(testim, "date:modify")  # `testim` is 5.7 MiB image on my computer
end
# Ending memory: 5264 MiB
# After two for-loop runs: 8553 MiB

Here I have just used a loop to repeat function for the MWE, but this also happens when cycling through different images as well. Two problems seem to arise.

First, the image is only 5.7 MiB, after reading it 15 times I would naively expect a memory use of 15*6 = 90 MiB if the memory never cleared, but after the for loop memory usage goes up by over ~2000 MiB.

Second, after several minutes, the memory still doesn't clear. While writing this post, the memory dropped back into the 3100 MiB neighborhood, but that was only after 10-15 minutes of Julia idle time.

Is there a bug here?

My info:

julia> versioninfo()
Julia Version 1.6.2
Commit 1b93d53fc4 (2021-07-14 15:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, haswell)
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 8

and via pkg> st [6218d12a] ImageMagick v1.2.1`.

johnnychen94 commented 3 years ago

I wouldn't be surprised by this at all. Generally, the efforts would go to the libjpeg wrapper https://github.com/JuliaImages/Images.jl/issues/960.

There's a newly initialized repo https://github.com/stevengj/JpegTurbo.jl @stevengj I'm not sure what's your plan on this, I could join force the development if you think this a good idea.

mihalybaci commented 3 years ago

FWIW, I just copied the code example from JpegTurbo and it does seem to avoid the memory issue.

function image_mem(filename)
    cinfo = LibJpeg.jpeg_decompress_struct()
    jerr = Ref{LibJpeg.jpeg_error_mgr}()
    cinfo.err = LibJpeg.jpeg_std_error(jerr)
    LibJpeg.jpeg_create_decompress(cinfo)
    infile = ccall(:fopen, Libc.FILE, (Cstring, Cstring), filename, "rb")
    LibJpeg.jpeg_stdio_src(cinfo, infile)
    LibJpeg.jpeg_read_header(cinfo, true)
    LibJpeg.jpeg_start_decompress(cinfo)
    w = Int(cinfo.output_width) # show the image width
    h = Int(cinfo.output_height) # show the image width
    LibJpeg.jpeg_destroy_decompress(cinfo)
    ccall(:fclose, Cint, (Ptr{Libc.FILE},), infile)
    return w, h
end

for i = 1:100
    image_mem(testim)
end
# Ending memory usage = starting memory usage

So this would be a decent workaround for my case if I can figure out where the created/modified dates are buried.

stevengj commented 3 years ago

@johnnychen94, I have no immediate plans to work on JpegTurbo — I mainly put it there as a starting point for later work. I would be happy to transfer it to JuliaIO or JuliaImages if desired, and/or to add collaborators. See also the discussion on discourse.

@IanButterworth also has a repo (https://github.com/IanButterworth/ImageIODevelopment.jl) with a similar Clang-generated wrapper for libjpeg, so it would be good to check with him on the best way forward.

IanButterworth commented 3 years ago

I don't think my dev repo got much past copying a c example.

I would've thought the thing to do would be to build on JPEGTurbo.jl (though I'd rename it JPEGFiles.jl perhaps) and move it to JuliaIO when ready.

I can't offer much time but happy to review.

stevengj commented 3 years ago

I don't think libjpeg has anything specifically for EXIF data, which is embedded in a single segment of a JPEG file. You can use libjpeg to extract that segment, I think, but you will have to parse the EXIF data yourself (or wrap another library like libexif). See https://dev.exiv2.org/projects/exiv2/wiki/The_Metadata_in_JPEG_files

kodintent commented 2 years ago

Hi. I encountered this issue also. With a set of images larger than my RAM. either magickinfo calls, in a for loop, led to my Julia script killing itself due to running out of RAM.

for path_image in array_path_images
    array_keys = magickinfo(path_image)
    #OR
    dict_key_value = magickinfo(path_image, (key1, key2))
end

The behavior is as if each loaded image, is kept in RAM and not purged. When passing in keys, I noticed one or two memory releases before the crash, but never when just getting the keys array. using @threads with the for loop makes no difference to the end result. At one stage i tried it with apt ImageMagick called in a bash script, and there were no memory issues. Luckily I just needed height and width, so i was able to use JuliaImages instead. But if i wanted to get other exif keys, I would have to use apt ImageMagick. imagemagick memory use

yakir12 commented 2 years ago

I encountered the same exact issue as @kodintent . I "solved" it by

Base.GC.gc()

after every call to magickinfo...

johnnychen94 commented 2 years ago

JpegTurbo https://github.com/JuliaIO/JpegTurbo.jl is available now and bundled together with ImageIO

@ashwani-rathee is working on EXIF wrapper for GSoC'22 https://github.com/JuliaImages/Images.jl/discussions/1000

yakir12 commented 2 years ago

That is super promising. Thank you. I'll try to see how I can get date and time created from the exif data.

yakir12 commented 2 years ago

Yeah, using @ashwani-rathee's code example in that link works flawlessly:

using Dates, libexif_jll
include("LibExif.jl")
function readtag(filepath,  tag)
  ed_ptr = LibExif.exif_data_new_from_file(filepath)
  ed = unsafe_load(ed_ptr)
  content_ptr = ed.ifd[1]
  make_ptr = LibExif.exif_content_get_entry(content_ptr, tag)
  str = Vector{Cuchar}(undef, 1024);
  LibExif.exif_entry_get_value(make_ptr, str, length(str))
  return rstrip(String(str), '\0')
end

file2dt(file) = DateTime(readtag(file, LibExif.EXIF_TAG_DATE_TIME), "yyyy:mm:dd HH:MM:SS")

Thank you both (and all the rest working on this)!