Open abrichr opened 7 months ago
For the task of finding similar UI images, here are comparisons of the three libraries:
FAISS (Facebook AI Similarity Search):
Purpose: FAISS is designed for efficient similarity search and clustering of dense vectors. It excels at searching for nearest neighbors in large datasets.
Approach: It uses vector quantization and inverted file indexing to achieve fast and memory-efficient searches.
Use Cases: Best for datasets where you can represent items (e.g., images) as vectors in a high-dimensional space. Commonly used in conjunction with deep learning models where images are represented by feature vectors.
Scalability: Highly scalable and can handle very large datasets, with GPU acceleration for even faster processing.
Integration: Requires integrating with deep learning libraries to first convert images into feature vectors before they can be indexed and searched.
Complexity: More complex to set up and use compared to ImageHash. Requires knowledge of vector space and possibly machine learning concepts.
Image-Similarity-Measures:
Purpose: This library provides a set of measures to calculate the similarity between two images using classical image processing techniques.
Approach: Includes a variety of similarity measures such as Structural Similarity Index (SSIM), Mean Squared Error (MSE), and others.
Use Cases: Suitable for comparing two images directly with one another without the need for a database. It's for scenarios where the comparison is pairwise and not against a large corpus of images.
Scalability: Does not inherently include indexing or database management features, so it's not aimed at scalability for large image databases.
Integration: Can be used as a standalone for direct image comparisons or integrated into a database system where each query requires a full scan of the dataset.
Complexity: Relatively easy to use for calculating direct similarity measures between images but lacks the infrastructure for quick retrieval from large datasets.
ImageHash:
Purpose: Designed specifically for creating hash representations of images that can be used to determine if two images are visually similar.
Approach: Uses algorithms like average, perceptual, difference, and wavelet hashing to create hashes that are robust to minor variations in images.
Use Cases: Ideal for applications where the objective is to detect duplicate or near-duplicate images, such as deduplicating a photo collection or finding similar items in a catalog.
Scalability: Can be used with databases to store and index hashes for moderate-sized datasets. Scalability is limited by the database's ability to handle the hash comparison operations.
Integration: Easy to integrate into systems already using Python and can be paired with any standard database.
Complexity: Relatively straightforward to implement, with a focus on hash-based similarity that is less computationally intensive than feature extraction methods.
In summary:
For your specific use case of finding similar UI images, if you're dealing with a large database of images and you need the performance, FAISS is a strong candidate. If the dataset is smaller and the task is more about detecting near-duplicates based on structural similarity, ImageHash is a more appropriate choice. Image-Similarity-Measures could be a supplementary tool for providing additional verification but is less suited for database operations.
Edit: Structural Similarity Index (SSIM) implemented in https://github.com/OpenAdaptAI/OpenAdapt/blob/main/openadapt/strategies/visual.py#L409
Feature request
https://github.com/OpenAdaptAI/OpenAdapt/pull/610 introduced the
VisualReplayStrategy
which works by segmenting the active window for every mouse event.This is wasteful because some or all of the active window may not change between mouse events.
We would like to implement the following optimization:
1. Store the segmentation retrieved in https://github.com/OpenAdaptAI/OpenAdapt/pull/610/files#diff-4123d48b6e604812e5bbba6507183956b05038539947eedfd02a7e475344cbc5R313 (i.e. theImplemented in https://github.com/OpenAdaptAI/OpenAdapt/blob/main/openadapt/models.py#L178.Segmentation
object) in the database.2. During replay, in theImplemented in https://github.com/OpenAdaptAI/OpenAdapt/blob/main/openadapt/strategies/visual.py#L409VisualReplayStrategy
, find the active window screenshot that is most similar to the current active window, e.g. using https://github.com/JohannesBuchner/imagehash. (Retrieve allScreenshot
s for the recording, and extract the active window with https://github.com/OpenAdaptAI/OpenAdapt/blob/main/openadapt/models.py#L315.)Note: in the case of the calculator example, the only difference will be the text containing the number at the top of the window. This will be removed in
vision.refine_masks
, which means that there will be nothing more to describe, and we can re-use the previousSegmentation
and descriptions. Therefore, this will be working when, during the calculator example, we only need to get descriptions once, for the first action.Motivation
VisualReplayStrategy is very slow.