UPGo-McGill / matchr

Fast and Reliable Matching of Images
Other
2 stars 0 forks source link

matchr

R build
status codecov

The goal of matchr is to facilitate fast and reliable comparison of large sets of images to identify identical or nearly-identical pairs. It works by generating distinctive image signatures from pixel data then correlating these signatures between sets of images. matchr’s approach allows for image comparisons to be made on large sets of images (tens or hundreds of thousands of files) on even modest computer hardware.

Images are down-sampled to 8x8 greyscale bitmaps and passed through a discrete cosine transformation, from which 128-bit signatures are calculated. These signatures can identify a given image even if the image is rescaled or compressed, and thus serve as a reliable indicator of whether two images are the same.

Using matrix algebra, matchr can compare multiple sets of images and identify likely matches. The method is robust to changes in tint, compression or aspect ratio, and to other minor differences such as small overlays added to one of a pair of matching images.

The package includes a Shiny app (compare_images and confirm_matches) for visually inspecting match results and confirming or rejecting the matches.

All time-consuming matchr functions can be parallelized through the future package, can report progress through the progressr package, and can save incremental backups in case of an error.

Installation

You can install the released version of matchr from CRAN with:

install.packages("matchr")

And the development version from GitHub with:

# install.packages("remotes")
remotes::install_github("UPGo-McGill/matchr")

Usage

The standard matchr workflow involves importing one or more sets of images with load_image, generating image signatures from the image sets (or the underlying file paths) using create_signature, matching image signatures using match_signatures, then refining and verifying the matches using identify_matches, confirm_matches, and integrate_changes.

Because each of the matchr functions can be very time- or computation-intensive, it is usually most convenient to run the functions separately. But in the case of relatively small comparison tasks, match_images provides an “all-in-one” function which performs each task sequentially and delivers the final results.

When the {shiny} package is installed, confirm_matches loads an interactive Shiny app for verifying the results of the comparison algorithm and for flagging individual matches for follow up.

Learn more

Use vignette(package = "matchr") to learn more about how matchr works!