CollectionBuilder / collectionbuilder-csv

CollectionBuilder-CSV is a "stand alone" template for creating digital collection and exhibit websites using Jekyll and a metadata CSV.
MIT License
21 stars 16 forks source link

Refactor image processing code using MiniMagick and ImageOptim #47

Closed maehr closed 1 year ago

maehr commented 1 year ago

This Pull Request refactors the image processing code in the CollectionBuilder-CSV helper tasks using MiniMagick and ImageOptim. The new code is more concise and easier to read, with unnecessary conditionals removed and the code structured for better readability. MiniMagick is used to handle the image resizing and conversion, while ImageOptim is used to optimize the images. These changes should result in improved performance and reduced image file sizes, while also making the code more maintainable and easier to modify in the future.

I tested this code with Cloudflare Pages using rake generate_derivatives[,,,false,true] && rake deploy. ImageMagick's security policy does not allow PDF conversion, for the images it works. You can try the demo here: https://collectionbuilder-csv.pages.dev/

evanwill commented 1 year ago

@maehr thank you for this PR (and the others!). We appreciate it. We probably won't have a chance to review them this week--but will get to it before too long. I will let you know if we have questions soon. Thank you again!

evanwill commented 1 year ago

@derekenos do you have a chance to look at this PR since it relates to generate_derivatives Rake tasks i think you might have originally written? I also think you were working on some extensions of it in CB-ES drafts.

Using the MiniMagick wrapper simplifies dealing with different ImageMagick versions and writing the magick commands in ruby (rather than as system calls). ImageOptim adds image optimization.

evanwill commented 1 year ago

@maehr first questions after testing the rake task:

running the task the output from PDF items is much lower quality. It is in the image creation step, not optimization step.

The original command was equivalent of: magick -density 300 example.pdf[0] -resize 800x800 -flatten example_sm.jpg

The density argument needs to happen before it opens the pdf, because it relates to how detailed it renders the pdf before it creates the image. It seems like that isn't happening correctly in these MiniMagick commands. I think it is creating an image using some defaults, then secondly applying all those other commands one at a time afterwards. It seems like for PDF items at least, it needs to combine options (rather than running the operations separately), and it also needs to correctly open the first page of the pdf at the set density (I am not seeing how to do that using MiniMagick without writing the commands using the metal options).

maehr commented 1 year ago

Thank you for reviewing the PR. I will try to fix the image processing pipeline using combined options as described in https://rubydoc.info/github/minimagick/minimagick#combine-options I'll try to avoid metal calls to keep interoperability.

maehr commented 1 year ago

Hi @evanwill, I refactored some stuff. Is this OK like this?

PS: We could further optimize the script using batch processing:

image_optim.optimize_images(Dir['*.png']) do |unoptimized, optimized|
  if optimized
    puts "#{unoptimized} => #{optimized}"
  end
end

image_optim.optimize_images!(Dir['*.*'])

image_optim.optimize_images_data(datas)

https://github.com/toy/image_optim#from-ruby

evanwill commented 1 year ago

It still isn't producing the correct output for PDF files--it's the same issue with when -density option is applied in the magick commands. I used MiniMagick.logger.level = Logger::DEBUG to check what magick commands the task is creating, and it is doing the equivalent of:

magick identify example.pdf
magick convert example.pdf[0] temp.jpg
magick mogrify -density 300 -resize 800x800 -flatten temp.jpg

So it is still doing that initial convert separately without the density which ensures that the pdf is initially rendered at high enough res to create a good image. Playing around and looking at the minimagick docs, I can't figure out how to make it use the equivalent of the correct command for PDFs (i.e. magick -density 300 example.pdf[0] -resize 800x800 -flatten example_sm.jpg). The image.format('jpg') always triggers a magick convert itself that doesn't seem to allow any options, and can't be used in combine_options. The only way I could get it to add the density correctly when opening a pdf is using the Metal version, like this:

inputfile = filename + "[0]"
magick = MiniMagick::Tool::Convert.new 
magick.density(density) if file_type == :pdf
magick << inputfile
magick.resize(size)
magick.flatten
magick << output_filename
magick.call

Not sure what the disadvantage is to using this approach or why the main minimagick api seems to have this limitation?

maehr commented 1 year ago

Thank you very much for the in-depth bug hunt. I added your code and removed the combined commands.

evanwill commented 1 year ago

My other concern is that image_optim_pack installs binaries for linux and mac, but nothing for windows. So unless windows users install some of the optimization tools manually, they end up with a bunch of error messages. The imagemagick part of the task still runs correctly--but for novice users a million error messages about missing binaries is going to be confusing!

Looks like this for each image processed:

No pack for this OS and/or ARCH, check verbose output
pngcrush worker: `pngcrush` not found; please provide proper binary or disable this worker (--no-pngcrush argument or `:pngcrush => false` through
pngout worker: `pngout` not found; please provide proper binary or disable this worker (--no-pngout argument or `:pngout => false` through options
advpng worker: `advpng` not found; please provide proper binary or disable this worker (--no-advpng argument or `:advpng => false` through options
optipng worker: `optipng` not found; please provide proper binary or disable this worker (--no-optipng argument or `:optipng => false` through opt
pngquant worker: `pngquant` not found; please provide proper binary or disable this worker (--no-pngquant argument or `:pngquant => false` through
oxipng worker: `oxipng` not found; please provide proper binary or disable this worker (--no-oxipng argument or `:oxipng => false` through options
jhead worker: `jhead` not found, `jpegtran` not found; please provide proper binary or disable this worker (--no-jhead argument or `:jhead => fals
jpegoptim worker: `jpegoptim` not found; please provide proper binary or disable this worker (--no-jpegoptim argument or `:jpegoptim => false` thr
jpegtran worker: `jpegtran` not found; please provide proper binary or disable this worker (--no-jpegtran argument or `:jpegtran => false` through
gifsicle worker: `gifsicle` not found; please provide proper binary or disable this worker (--no-gifsicle argument or `:gifsicle => false` through

To avoid this should optimization be opt in, or is there a way to detect windows environment before calling optimization, or a way to suppress the error message on windows?

evanwill commented 1 year ago

I was just testing it to run via GitHub Action--I think the Gemfile need 'rake' added as well so that you can run the task using bundle exec. gem 'rake'

With that in the Gemfile, then it ran fine just by adding a bundle exec rake generate_derivatives step to the jekyll workflow--with the exception of the security policy error when trying to process PDFs. Not sure if there is a work around for that.

maehr commented 1 year ago

@evanwill I have implemented the changes you requested and a workaround for Windows users. Unfortunately, I don't have a way to test it at the moment. Can you do that?

evanwill commented 1 year ago

I tested on:

evanwill commented 1 year ago

I should add the exception on linux there is the pdf security configuration issue, which on desktop I use this work around, but not sure what to do in GitHub Actions or other remote environments.

maehr commented 1 year ago

I should add the exception on linux there is the pdf security configuration issue, which on desktop I use this work around, but not sure what to do in GitHub Actions or other remote environments.

I guess PDF is too much of a security issue for GH, Cloudflare et al. (See https://portswigger.net/daily-swig/imagemagick-pdf-parsing-flaw-allowed-attacker-to-execute-shell-commands-via-maliciously-crafted-image )

evanwill commented 1 year ago

@maehr seems ready to go to me. Anything else?

maehr commented 1 year ago

@maehr seems ready to go to me. Anything else?

No, seems ready to me as well. Thank you @evanwill very much. I am looking forward to working on CB with you and your team.

evanwill commented 1 year ago

Merging, thank you!

maehr commented 1 year ago

Ps: Sorry for bloating the git history. Next time I'll provide a summary to squash merge my changes 😅😇