KilianB / JImageHash

Perceptual image hashing library used to match similar images
MIT License
397 stars 80 forks source link

HashValue as zero. #41

Closed SandhyaU95 closed 4 years ago

SandhyaU95 commented 4 years ago

Hi Kilian, While generating a hash value for the attached file using the below parameters, resulting in 0. With the same file, we got hash value: 20282409603651670423947251810304 a few days back. The same file with different formats[PNG, BMP, GIF, TIFF] also resulting in the hash value 0. Please let me know if you require additional information.

        HashingAlgorithm hasher = new DifferenceHash(256, Simple);
        Hash hash = hasher.hash(bufferedImage );
        BigInteger hashValue = hash.getHashValue();

image

SandhyaU95 commented 4 years ago

Could you please check the above issue that why are we getting hash value as zero. Please let me know if you need any information.

KilianB commented 4 years ago

If this is still true (a 0 hash value) please upload the original file to a file hoster and send me the link (e.g. kilian[at]brachtendorf.dev) We can not be sure that github doesn't alter something when uploading the link here. The library currently does not support transparency (it will be interpreted as either black or white).

You will not get different hash values using the same image. The computation is deterministic and will by definition return the same hash value for the same input, something like image size, encoding or parameters must have changed.

I have not worked with text input yet. Depending on where the image is sliced you will have a hard time with a single precision difference hash. Maybe try to least do a 2 dimensional pass (double precision)

KilianB commented 4 years ago

@SandhyaU95 continuing here to keep the conversation and possible tracking of the issue public so everyone can benefit. You got a pdf and want to hash it's content. Can you please provide the code snippet you used to convert the pdf to an image?

SandhyaU95 commented 4 years ago

Ok. I have sent you the link, please check. Tried with Double precision too, was getting the same hash value which we get for an empty file. Used ImageMagick for converting from pdf to jpeg. Thanks.

KilianB commented 4 years ago

If asking questions please refer to: https://stackoverflow.com/help/minimal-reproducible-example To reproduce and test your issue I need to same input data you are using. I don't know which version of ImageMagick you are using, what parameters, how you load the image into your application.

Here it is important to have the same image data as this is used to compute the hash. Please provide me the actual png or jpeg file.

KilianB commented 4 years ago

I received the original image and as expected the following is true

I have not worked with text input yet. Depending on where the image is sliced you will have a hard time with a single precision difference hash. Maybe try to least do a 2 dimensional pass (double precision)

A 512 difference hash simply has not enough resolution to map the individual features of the text.

512 bit

Hash512

1024bit

Hash1024

2048 bit

Hash2048

20000 Hash

Hash20000

BufferedImage in = ImageIO.read(new File("2Linepdftojpeg.jpeg"));

//Convert image since input type is a weird custom format
BufferedImage newImage = new BufferedImage(in.getWidth(), in.getHeight(), BufferedImage.TYPE_INT_RGB);

Graphics2D g = newImage.createGraphics();
g.drawImage(in, 0, 0, null);
g.dispose();

HashingAlgorithm hasher = new DifferenceHash(2048, Precision.Simple);

Hash hash = hasher.hash(newImage);
BufferedImage imgRepresentation = hash.toImage(3, hasher);

ImageIO.write(imgRepresentation, "png", new File("Hash2048.png"));

Your solution is to try different hashing algorithms with different settings and see which fits your need. You will have to adjust your matching threshold accordingly for such images. Maybe you are looking for a library specialized at text comparison, with OCR support and text comparison using edit distances.

SandhyaU95 commented 4 years ago

Got the hash value just by changing the bitResolution to > 552. Thank you.