99designs / colorific

Automatic color palette detection
ISC License
700 stars 55 forks source link

Could someone explain min_distance, min_prominence, and n_quantized? #26

Closed deronsizemore closed 9 years ago

deronsizemore commented 9 years ago

I don't see any documentation explaining these things but see them as options when I look at the extrac_colors function in the code. I see that the colors extracted are changing when I set these options, but I have no idea what I'm doing with them or why the colors are changing.

Thanks.

larsyencken commented 9 years ago

These have to do with the specifics of how colorific determines the palette. Here's the steps involved and how the knobs help tune things.

1. Mass color reduction

Colorific reduces the image firstly from it's full spectrum of colors to just n_quantized (100 by default), as a kind of optimization step. To be honest, image processing is normally done in C not Python, so this step is about making the later, more interesting stuff fast enough.

Increase n_quantized to get a slower but more accurate result.

Decrease n_quantized to get a faster, less accurate palette. The result becomes less accurate because minor colors become averaged and blended in to the major colors you want to keep.

Step 2: collapse similar colors together

After step one we may have 100 colors left. Many of those "different" colors are actually the same to the human eye. Color distances are measured in units of ∆E, and according to Wikipedia: Color difference "the same" means ∆E < 2.3. We go beyond that and say by default that ∆E <= min_distance (default: 10) is more or less the same color, and merge them together.

So min_distance tunes this sameness test. Make it smaller to allow colors which are more alike. Make it larger to avoid noisy palettes with multiple copies of nearly-the-same color.

Step 3: how many colors to keep?

max_colors is clear, and defaults to 5. So, keep up to 5 colors. But we could return less. A pure red image should only return one color. How many colors are worth keeping?

We keep colors that are within min_prominence of the frequency of the major color. So if the most prominent color is 50% of the image, and min_prominence is 0.01, we'll keep the first 5 colors that are at least 50% * 0.01 = 0.5% of the image.

Decrease min_prominence to fill out your palette more often with more minor colors, which can include more noise.

Increase min_prominence to get fewer, more prominent colors in each palette.

Hope all that helps!

deronsizemore commented 9 years ago

Thank you. That makes a lot more sense now! So, is there any threshold for how high you should make these numbers? For example, in my testing for the last hour or so, I've found that this gives me about as accurate of results as I can get (seemingly):

extract_colors(image, max_colors=5, min_distance=15, min_prominence=.05, n_quantized=200)

I did fine that if I went above 200 for the n_quantized value, that some images would error out on me. Keeping it 200 or below seems to work 100% of the time.

So, are there limits to how high the numbers should go? I assume there probably isn't except for the fact that setting higher numbers typically would just return less colors which kind of defeats the purpose of extracting colors from an image.

larsyencken commented 9 years ago

The algorithm for quantizing colors might use an 8-bit integer in its calculations. It's total speculation, but I'm guessing the algorithm is there for GIF image support, and GIFs use 8-bit per pixel. Then the max for n_quantized would end up being 256.

min_prominence would max out at 1.0, which should result in only the most prominent color being returned each time

min_distance would max out at whatever the largest ∆E is between the farthest two colors. Even before then, it would start collapsing increasingly different colors together as it gets bigger, only showing one of them. For example only showing either a light blue or a dark blue, when in fact both colors were important and complementary parts of the palette.

The kind of manual tuning you're doing is basically what we did to come up with our thresholds. There's no special magic behind them, they're just what seemed to work best on our library of graphic designs.

deronsizemore commented 9 years ago

Great. Thanks so much for taking the time to reply and help out!