GuidoBartoli / sherloq

An open-source digital image forensic toolset
GNU General Public License v3.0
2.6k stars 238 forks source link

Copy-Move Forgery #5

Closed under-score closed 4 years ago

under-score commented 4 years ago

For a wider distribution compiled versions would be greatly appreciated.

Btw the MacOS link in the readme is outdated, I would link to https://dev.to/micuffaro/easy-workflow-for-switching-python-virtual-environments-with-zsh-19lc

There is one installation error

File "/Applications/sherloq/gui/digest.py", line 5, in <module> import magic
File "/Users/my/.virtualenvs/sherloq/lib/python3.7/site-packages/magic.py", line 201, in <module>
    raise ImportError('failed to find libmagic.

going back from 0.4.18 to 0.4.14 helped

(sherloq) (base) my@gui % pip uninstall python-magic-bin
(sherloq) (base) my@gui % pip install python-magic-bin==0.4.14

Great working, looking for the cloning tool

GuidoBartoli commented 4 years ago

Yes, I will build a compiled version as soon as the majority of tools will be implemented, I'm still in the alpha stage, but I'm looking forward to deploy with fbs or distribute via pip.

Thanks for the MacOS link, I updated README!

That's a strange error, I tried to create a new virtual environment from scratch and after installing the requirements, the application runs without problems. Maybe I have some apt package related to libmagic already installed? What is your current setup?

$ mkvirtualenv sq2
$ pip install -r requirements.txt 
Collecting lxml
  Downloading lxml-4.5.1-cp36-cp36m-manylinux1_x86_64.whl (5.5 MB)
     |████████████████████████████████| 5.5 MB 3.2 MB/s 
Collecting matplotlib
  Downloading matplotlib-3.2.2-cp36-cp36m-manylinux1_x86_64.whl (12.4 MB)
     |████████████████████████████████| 12.4 MB 2.8 MB/s 
Collecting opencv-contrib-python-nonfree
  Downloading opencv_contrib_python_nonfree-4.1.1.1-cp36-cp36m-manylinux1_x86_64.whl (34.9 MB)
     |████████████████████████████████| 34.9 MB 4.2 MB/s 
Collecting pip-chill
  Downloading pip-chill-1.0.0.tar.gz (16 kB)
Collecting pyside2
  Downloading PySide2-5.15.0-5.15.0-cp35.cp36.cp37.cp38-abi3-manylinux1_x86_64.whl (170.8 MB)
     |████████████████████████████████| 170.8 MB 2.9 MB/s 
Collecting python-magic
  Downloading python_magic-0.4.18-py2.py3-none-any.whl (8.6 kB)
Collecting pywavelets
  Downloading PyWavelets-1.1.1-cp36-cp36m-manylinux1_x86_64.whl (4.4 MB)
     |████████████████████████████████| 4.4 MB 5.5 MB/s 
Collecting sewar
  Downloading sewar-0.4.3.tar.gz (10 kB)
Collecting tabulate
  Downloading tabulate-0.8.7-py3-none-any.whl (24 kB)
Collecting pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1
  Downloading pyparsing-2.4.7-py2.py3-none-any.whl (67 kB)
     |████████████████████████████████| 67 kB 5.0 MB/s 
Collecting kiwisolver>=1.0.1
  Downloading kiwisolver-1.2.0-cp36-cp36m-manylinux1_x86_64.whl (88 kB)
     |████████████████████████████████| 88 kB 5.1 MB/s 
Collecting python-dateutil>=2.1
  Downloading python_dateutil-2.8.1-py2.py3-none-any.whl (227 kB)
     |████████████████████████████████| 227 kB 4.3 MB/s 
Collecting numpy>=1.11
  Downloading numpy-1.19.0-cp36-cp36m-manylinux2010_x86_64.whl (14.6 MB)
     |████████████████████████████████| 14.6 MB 3.7 MB/s 
Collecting cycler>=0.10
  Downloading cycler-0.10.0-py2.py3-none-any.whl (6.5 kB)
Collecting shiboken2==5.15.0
  Downloading shiboken2-5.15.0-5.15.0-cp35.cp36.cp37.cp38-abi3-manylinux1_x86_64.whl (856 kB)
     |████████████████████████████████| 856 kB 4.1 MB/s 
Collecting scipy
  Downloading scipy-1.5.0-cp36-cp36m-manylinux1_x86_64.whl (25.9 MB)
     |████████████████████████████████| 25.9 MB 5.2 MB/s 
Collecting Pillow
  Downloading Pillow-7.1.2-cp36-cp36m-manylinux1_x86_64.whl (2.1 MB)
     |████████████████████████████████| 2.1 MB 3.2 MB/s 
Collecting six>=1.5
  Downloading six-1.15.0-py2.py3-none-any.whl (10 kB)
Building wheels for collected packages: pip-chill, sewar
  Building wheel for pip-chill (setup.py) ... done
  Created wheel for pip-chill: filename=pip_chill-1.0.0-py2.py3-none-any.whl size=6555 sha256=2d28588cfc6afdb38d6f4d7ff8806063387c67f6b0f3acb431350d7906877b3b
  Stored in directory: /tmp/pip-ephem-wheel-cache-kjyp8nxe/wheels/dc/d0/65/30eef17b719d9a39f377c36ecca468686e953b32a2decdcde7
  Building wheel for sewar (setup.py) ... done
  Created wheel for sewar: filename=sewar-0.4.3-py3-none-any.whl size=10347 sha256=5b73b072a36de0d6b26cb90b86efa98518a651cd771a7b73bb70de7a9519f426
  Stored in directory: /tmp/pip-ephem-wheel-cache-kjyp8nxe/wheels/46/84/81/96dcbf3446a2e81039045cee5079cab614435959e61b9e99e4
Successfully built pip-chill sewar
Installing collected packages: lxml, pyparsing, kiwisolver, six, python-dateutil, numpy, cycler, matplotlib, opencv-contrib-python-nonfree, pip-chill, shiboken2, pyside2, python-magic, pywavelets, scipy, Pillow, sewar, tabulate
Successfully installed Pillow-7.1.2 cycler-0.10.0 kiwisolver-1.2.0 lxml-4.5.1 matplotlib-3.2.2 numpy-1.19.0 opencv-contrib-python-nonfree-4.1.1.1 pip-chill-1.0.0 pyparsing-2.4.7 pyside2-5.15.0 python-dateutil-2.8.1 python-magic-0.4.18 pywavelets-1.1.1 scipy-1.5.0 sewar-0.4.3 shiboken2-5.15.0 six-1.15.0 tabulate-0.8.7
$ python sherloq.py
under-score commented 4 years ago

Thanks for all. The only thing I can say that I have a fresh Python install as well as a new environment (switching my machine) recently before running pip install -r requirements.txt

GuidoBartoli commented 4 years ago

Great working, looking for the cloning tool

Region Cloning tool added in the latest commit (0507cdb603876697eebba2137a4b19a64f1cc4ac). There are a bunch of parameters to tweak, but it turns out to be an effective feature-based approach.

under-score commented 4 years ago

Genius! Worked perfectly on my first 2 examples. I was experimenting with orb.detect() cv2.FlannBasedMatcher() which seemed a bit faster than SIFT - my 3rd example 2398 × 874px took >1min.

GuidoBartoli commented 4 years ago

In the latest commit, I think the cloning tool should be more user-friendly and BRISK detector is chosen by default, since it consumes much less memory than SIFT, however I have to make some test with FLANN Matcher as you suggested to see if matching accuracy is comparable to Brute Force.

under-score commented 4 years ago

very much improved, even recognizes now some difficult to see manipulations. BRISK is best option -> https://ieeexplore.ieee.org/document/8346440 No false positives, but false negatives. Need more experience with setting parameters (threshold slightly higher?) No idea, why I do not always get a progress bar.

GuidoBartoli commented 4 years ago

Need more experience with setting parameters (threshold slightly higher?)

I'm writing help pages for all the tools, but it is not finished yet, so here are the parameter descriptions:

Keypoints are drawn as circles with size proportional to detector response and lines are colored with this criterion:

In the next update, I will add some tooltips ;)

However, I'm also considering adding a Reduction option to Copy-Move Forgery to work also with big images with lots of keypoints (SIFT and SURF are using too much memory and can make the program crash).

No idea, why I do not always get a progress bar with cancel option...

The progress bar is displayed only during clustering (it is an explicit loop over matches), while the detection and matching phases are based on single OpenCV function that cannot be monitored from external code (or at least, I do not know how to do it...). The clustering progress bar is displayed only if the operation is taking more than some seconds, otherwise it remains hidden (when the process is finished, the main window status bar shows the total elapsed time).

GuidoBartoli commented 4 years ago

Some more updates on Copy-Move Forgery in the latest 6945a3c1c828ef8ad8c5e910898b790a63d2c8c0 commit. If you want to perform some more tests, you're welcome :)

under-score commented 4 years ago

Cool. No more SIFT :-)

Like the text output at least for debugging, what Python is doing right now.

Reversing grey of the "Process" button does not work here, neither results being displayed, see screenshot.

screen

Unfortunately I have zero Python / multithreading experience ...

GuidoBartoli commented 4 years ago

Can you please send me the image you're working on?

under-score commented 4 years ago

I think there are some official training sets see https://towardsdatascience.com/image-forgery-detection-2ee6f1a65442 while I am using it for a very special purpose - fraud detection in scientific papers - with some more recent examples attached Archiv.zip

GuidoBartoli commented 4 years ago

Ok, thanks for linking the datasets, I will make some tests with them. However, I tried some of your images and, apart from some parameter fiddling, the algorithm finds some duplications: copy2 copy3

The first image you reported seems to have too weak features (the pale blue lines) to be detected, so I did not find working parameters for now, however sometimes the program exhausted all available memory because BRISK found too many points, so I am thinking about adding another "Response" parameter to keep only strong enough keypoint for faster matching.

under-score commented 4 years ago

Correct. "Ea0r-AdUcAAIXIu" was more difficult although clearly visible duplication.

under-score commented 4 years ago

big interest today at Twitter, head over to https://twitter.com/science_surf/status/1278564353436987392

GuidoBartoli commented 4 years ago

Wow, so glad to hear that my tool has been used in this research, many thanks for sharing! :)

PS: are you Matthias Wjst in the Twitter post?

under-score commented 4 years ago

Yes, Sir. This is a small but highly active community. Read more at The science and art of detecting data manipulation and fraud. Maybe they need also a simple drawing tool for rectangles/circles on native images as usually only the annotated pictures are discussed. And of course a good manual.

Bigger audience certainly journals, magazines and image outlets who need to be sure that images are not being tampered. Commercial offers are all expensive, intransparent and not easily accessible.

GuidoBartoli commented 4 years ago

Really interesting stuff, I worked in the biomedical field in the past, but I was not aware of such interest in digital image forensics in this scientific papers and manipulation seems more common than I thought.

Other than that, if you want you can try out the latest Composite Splicing tool (based on the excellent NoisePrint algorithm) added in the latest 3bf77b14e856387dee14f5e79178c281ab108432 commit (you need to reinstall package requirements, because tensorflow is needed for CNN).

Many thanks! :)

under-score commented 4 years ago

ERROR: Could not find a version that satisfies the requirement tensorflow==1.2.1 (from -r requirements.txt (line 11)) (from versions: 1.13.0rc1, 1.13.0rc2, 1.13.1, 1.13.2, 1.14.0rc0, 1.14.0rc1, 1.14.0, 1.15.0rc0, 1.15.0rc1, 1.15.0rc2, 1.15.0rc3, 1.15.0, 1.15.2, 1.15.3, 2.0.0a0, 2.0.0b0, 2.0.0b1, 2.0.0rc0, 2.0.0rc1, 2.0.0rc2, 2.0.0, 2.0.1, 2.0.2, 2.1.0rc0, 2.1.0rc1, 2.1.0rc2, 2.1.0, 2.1.1, 2.2.0rc0, 2.2.0rc1, 2.2.0rc2, 2.2.0rc3, 2.2.0rc4, 2.2.0, 2.3.0rc0) ERROR: No matching distribution found for tensorflow==1.2.1 (from -r requirements.txt (line 11))

under-score commented 4 years ago

You are really doing a great job here by bridging two worlds. In biomedicine we call that translational research when basic lab research results are finally being applied to a patient in the hospital.

GuidoBartoli commented 4 years ago

ERROR: Could not find a version that satisfies the requirement tensorflow==1.2.1 (from -r requirements.txt (line 11)) (from versions: 1.13.0rc1, 1.13.0rc2, 1.13.1, 1.13.2, 1.14.0rc0, 1.14.0rc1, 1.14.0, 1.15.0rc0, 1.15.0rc1, 1.15.0rc2, 1.15.0rc3, 1.15.0, 1.15.2, 1.15.3, 2.0.0a0, 2.0.0b0, 2.0.0b1, 2.0.0rc0, 2.0.0rc1, 2.0.0rc2, 2.0.0, 2.0.1, 2.0.2, 2.1.0rc0, 2.1.0rc1, 2.1.0rc2, 2.1.0, 2.1.1, 2.2.0rc0, 2.2.0rc1, 2.2.0rc2, 2.2.0rc3, 2.2.0rc4, 2.2.0, 2.3.0rc0) ERROR: No matching distribution found for tensorflow==1.2.1 (from -r requirements.txt (line 11))

Yes, I used an obsolete tensorflow version for compatibility with noiseprint, Github automatically notified this just now, tonight I will fix the requirements.txt and check that everything works even after the update. Thanks!

under-score commented 4 years ago

NoisePrint is excellent. Did you also come across https://github.com/peterwang512/FALdetector ? Paper at https://arxiv.org/pdf/1906.05856.pdf

GuidoBartoli commented 4 years ago

ERROR: Could not find a version that satisfies the requirement tensorflow==1.2.1 (from -r requirements.txt (line 11)) (from versions: 1.13.0rc1, 1.13.0rc2, 1.13.1, 1.13.2, 1.14.0rc0, 1.14.0rc1, 1.14.0, 1.15.0rc0, 1.15.0rc1, 1.15.0rc2, 1.15.0rc3, 1.15.0, 1.15.2, 1.15.3, 2.0.0a0, 2.0.0b0, 2.0.0b1, 2.0.0rc0, 2.0.0rc1, 2.0.0rc2, 2.0.0, 2.0.1, 2.0.2, 2.1.0rc0, 2.1.0rc1, 2.1.0rc2, 2.1.0, 2.1.1, 2.2.0rc0, 2.2.0rc1, 2.2.0rc2, 2.2.0rc3, 2.2.0rc4, 2.2.0, 2.3.0rc0) ERROR: No matching distribution found for tensorflow==1.2.1 (from -r requirements.txt (line 11))

Should be fixed in latest 9c4ff27c0e0600e6b4c6517aaa047fd0d010893a commit. Please clean up your virtual environment, reinstall package requirements and try again. Can you also please specify your current Python interpreter version?

GuidoBartoli commented 4 years ago

NoisePrint is excellent. Did you also come across https://github.com/peterwang512/FALdetector ? Paper at https://arxiv.org/pdf/1906.05856.pdf

Yes, thanks, I read that article some time ago, but for now it's my intention to include only algorithms for detecting manipulation in generic images and FALdetector is a nice tool, but works only in a very specific scenario (i.e. faces warped with Photoshop).

under-score commented 4 years ago

Agree with all you said.

After a hard reset, I had a minor problem

  File "sherloq.py", line 19, in <module>
    from digest import DigestWidget
  File "/Applications/Sherloq/sherloq/gui/digest.py", line 5, in <module>
    import magic

which could be solved by

pip uninstall python-magic
pip install python-magic-bin==0.4.14
under-score commented 4 years ago

there seems to be another conflict that I could not solve

(sherloq) (base) user@vivi13 gui % python sherloq.py                 
objc[17106]: Class QMacAutoReleasePoolTracker is implemented in both /Users/user/.virtualenvs/sherloq/lib/python3.7/site-packages/PySide2/Qt/lib/QtCore.framework/Versions/5/QtCore (0x10a4fe0f8) and /Users/user/.virtualenvs/sherloq/lib/python3.7/site-packages/cv2/.dylibs/QtCore (0x11d843700). One of the two will be used. Which one is undefined.
objc[17106]: Class QT_ROOT_LEVEL_POOL__THESE_OBJECTS_WILL_BE_RELEASED_WHEN_QAPP_GOES_OUT_OF_SCOPE is implemented in both /Users/user/.virtualenvs/sherloq/lib/python3.7/site-packages/PySide2/Qt/lib/QtCore.framework/Versions/5/QtCore (0x10a4fe170) and /Users/user/.virtualenvs/sherloq/lib/python3.7/site-packages/cv2/.dylibs/QtCore (0x11d843778). One of the two will be used. Which one is undefined.
objc[17106]: Class KeyValueObserver is implemented in both /Users/user/.virtualenvs/sherloq/lib/python3.7/site-packages/PySide2/Qt/lib/QtCore.framework/Versions/5/QtCore (0x10a4fe198) and /Users/user/.virtualenvs/sherloq/lib/python3.7/site-packages/cv2/.dylibs/QtCore (0x11d8437a0). One of the two will be used. Which one is undefined.
objc[17106]: Class RunLoopModeTracker is implemented in both /Users/user/.virtualenvs/sherloq/lib/python3.7/site-packages/PySide2/Qt/lib/QtCore.framework/Versions/5/QtCore (0x10a4fe1e8) and /Users/user/.virtualenvs/sherloq/lib/python3.7/site-packages/cv2/.dylibs/QtCore (0x11d8437f0). One of the two will be used. Which one is undefined.
objc[17106]: Class QCocoaPageLayoutDelegate is implemented in both /Users/user/.virtualenvs/sherloq/lib/python3.7/site-packages/PySide2/Qt/lib/QtPrintSupport.framework/Versions/5/QtPrintSupport (0x12c78d540) and /Users/user/.virtualenvs/sherloq/lib/python3.7/site-packages/cv2/.dylibs/QtPrintSupport (0x151ef3468). One of the two will be used. Which one is undefined.
objc[17106]: Class QCocoaPrintPanelDelegate is implemented in both /Users/user/.virtualenvs/sherloq/lib/python3.7/site-packages/PySide2/Qt/lib/QtPrintSupport.framework/Versions/5/QtPrintSupport (0x12c78d5b8) and /Users/user/.virtualenvs/sherloq/lib/python3.7/site-packages/cv2/.dylibs/QtPrintSupport (0x151ef34e0). One of the two will be used. Which one is undefined.
QObject::moveToThread: Current thread (0x7fd975ed4df0) is not the object's thread (0x7fd976b52c70).
Cannot move to target thread (0x7fd975ed4df0)
You might be loading two sets of Qt binaries into the same process. Check that all plugins are compiled against the right Qt binaries. Export DYLD_PRINT_LIBRARIES=1 and check that only one set of binaries are being loaded.
qt.qpa.plugin: Could not load the Qt platform plugin "cocoa" in "/Users/user/.virtualenvs/sherloq/lib/python3.7/site-packages/cv2/qt/plugins" even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

Available platform plugins are: cocoa, minimal, offscreen, webgl.

zsh: abort      python sherloq.py
under-score commented 4 years ago

NoisePrint is excellent. Did you also come across https://github.com/peterwang512/FALdetector ? Paper at https://arxiv.org/pdf/1906.05856.pdf

Yes, thanks, I read that article some time ago, but for now it's my intention to include only algorithms for detecting manipulation in generic images and FALdetector is a nice tool, but works only in a very specific scenario (i.e. faces warped with Photoshop).

Looked into more details. Could not detect the modifications at the example picture at https://github.com/peterwang512/FALdetector with any tool currently available in Sherloq.

GuidoBartoli commented 4 years ago

This is expected, that kind of manipulation will be (hopefully) detected once I have implemented Image Resampling tool (you can find it in the Tampering group); it will be based on the pioneering work by Hany Farid.

GuidoBartoli commented 4 years ago

However, apart from specific resampling detection, standard ELA can be your friend here: if you take a closer look, you can see a central darker halo, meaning that her face has a lower "quality" than the forehead and hair where the halo vanishes. Elements with similar textures on the same focal plane should have matching residuals, so this is a sign of a local manipulation. ela

under-score commented 4 years ago

Probably another stupid question here. Trying to recreate a stripped down version of cloning.py for my command line screening

fn = "tmp.jpg"
dir = os.path.dirname(os.path.realpath(__file__))
img = cv2.imread(dir+"/"+fn)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
mask = cv2.threshold(cv2.cvtColor(img, cv2.COLOR_BGR2GRAY), 0, 1, cv2.THRESH_BINARY)
brisk = cv2.BRISK_create()
kpts, desc = brisk.detectAndCompute(gray, mask)

which works only in your project but not in mine TypeError: Expected Ptr<cv::UMat> for argument 'mask'

GuidoBartoli commented 4 years ago

The threshold function returns a tuple (threshold, output) with both computed threshold (if needed) and output image.

If you want to use Otsu automatic threshold, your code should look like this:

import cv2 as cv
img = cv.imread('img.jpg', cv.IMREAD_GRAYSCALE)
thr, _ = cv.threshold(img, 0, 255, cv.THRESH_OTSU)
_, mask = cv.threshold(img, thr, 255, cv.THRESH_BINARY)
brisk = cv.BRISK_create()
kpts, desc = brisk.detectAndCompute(img, mask)

Otherwise, you can set a fixed value for thr and avoid the first threshold() call.

under-score commented 4 years ago

million thanks

under-score commented 4 years ago

my last two questions in this thread

Where is the colour cast introduced in the output of cloning.py? Must be somewhere after output = np.copy(image)

And as most of the time is being consumed in the loop of keypoints - couldn't that been vectorized? https://towardsdatascience.com/data-science-with-python-turn-your-conditional-loops-to-numpy-vectors-9484ff9c622e

GuidoBartoli commented 4 years ago

Where is the colour cast introduced in the output of cloning.py? Must be somewhere after output = np.copy(image)

Are you talking about the colored lines connecting matching keypoints? A simple algorithm based on the line angle is used for applying colors and it is implemented from line 245 to 269 in cloning.py. From line 271 to 282 there is another heuristic with K-Means applied to line angles to estimate how many regions have been cloned (it is a pessimistic estimate and is used only to fill the status label).

And as most of the time is being consumed in the loop of keypoints - couldn't that been vectorized? https://towardsdatascience.com/data-science-with-python-turn-your-conditional-loops-to-numpy-vectors-9484ff9c622e

np.vectorize is a good idea, I will look into it, thanks!

under-score commented 4 years ago

did not mean circles or lines just the original background color which changed from INPUT in OUTPUT out

GuidoBartoli commented 4 years ago

You have red and blue channel swapped: imread() and imwrite() functions in OpenCV have a default BGR ordering, you need to take care of that when you work with colored images (otherwise you can use cvtColor() with COLOR_BGR2RGB option). Did you notice this in Sherloq or in your application?

under-score commented 4 years ago

magic - colors are back (only in my stripped down version) but occasionally a new error

   image = cv.cvtColor(image, cv.COLOR_BGR2RGB)
cv2.error: OpenCV(4.3.0) /Users/travis/build/skvark/opencv-python/opencv/modules/imgproc/src/color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cvtColor'

maybe color need to be checked before modified?