booru / philomena

Next-generation imageboard software. This software development project is independent from any image hosting project.
GNU Affero General Public License v3.0
9 stars 10 forks source link

Automatic Perceptual Dedupe reports don't get created #92

Open Ricardo50 opened 3 years ago

Ricardo50 commented 3 years ago

Describe the bug Automatic Perceptual Dedupe reports don't get created when uploading an image similar to the ones that is already there.

To Reproduce Steps to reproduce the behavior:

  1. Upload an image to the board
  2. Upload a second image when the first one is done processing. It can be of a different resolution, with a small difference or with a different file format

Expected behavior I expect to see an automatic dedupe report, but none get made

Screenshots If applicable, add screenshots to help explain your problem. image image

Desktop (please complete the following information):

Additional context I've tested with many different image pairs and options

basisbit commented 3 years ago

Thank you for creating this bug report and thus helping improve the imageboard software! Could you please attach the two sample images to the issue, so that we can try to reproduce the problem? Also, how did you set up the imageboard - is it running as docker containers, is it running in production mode or dev mode (see your .env file) and what version of the imageboard are you running (which git commit or clone date)?

Ricardo50 commented 3 years ago

Thank you for the quick response. I haven't been sitting still and been testing for a while. See the attached sample images below: 1 2

What I found out is that if I set the option duplication_checked for the other image of the duplicate pair to false and then perform an image repair on the other similar image that an duplicate report will be made image

If I change the code lib/philomena/duplicate_reports.ex, line 20 to |> where([i, _it], i.duplication_checked == true) (aka inverting the check), it seems to work.

What I found with this basic troubleshoot is that the logic is probably inverted. Is this helpful?

basisbit commented 3 years ago

The feature itself seems to be working fine with the booru fork. The job runs and perceptual deduplication reports are being created. For example the images https://manebooru.art/images/614170 (english) and https://manebooru.art/images/614171 (spanish) are reported for being possibly duplicates. Of course, neither your sample images nor my sample images are duplicates, but rather have "easily" visible changes that even can be seen on a 128x128 pixel scaled down image.

Regarding the code change that you suggest, I'd guess that @liamwhite would be the best to ask.

I can confirm that the sample images which you provided do not cause philomena to create a perceptive duplication report.