GuidoBartoli / sherloq

An open-source digital image forensic toolset
GNU General Public License v3.0
2.62k stars 241 forks source link

Export capability - proposal & motivation #31

Closed jtlz2 closed 3 years ago

jtlz2 commented 3 years ago

This builds on #28 but is a wider feature request that I would like to motivate for here. @GuidoBartoli Please shout where you agree and disagree below.

Sherloq provides excellent tools for visual analysis and it does this job well.

Numerical analysis is an important component of forensic analysis. This could be used, for example, to build a tampering detector that runs blind without human intervention. This could be incorporated but may be out of scope for Sherloq's intended purpose. Broadcasting to another python instance could also be helpful but probably hard to implement.

There are many good tools for such analysis. One example from my previous field of astronomy(*) is saoimage ds9 (https://sites.google.com/cfa.harvard.edu/saoimageds9). It is well established and contains a rich feature set (too many to mention but:) including,

(*) Probably every field has its favourite tools, however; I would be loathe to lock Sherloq into any particular field's toolset, awesome as ds9 is.

I imagine some of these would be either (i) straightforward or (ii) useful to add to Sherloq. But for those that are neither why reinvent the wheel?

A quick-kill route to accessing the above functionality - in addition to the existing export capability(+) - would be a comprehensive export button geared towards numerical analysis and development of blind, automated pipelines. This could simply be export to .npy (for all images) or .pkl for other variables and would afford flexibility by avoiding field lock-in. Examples of other possible variables might be

(+) I take the point about sticking to PNG for export because it is lossless. Of course JPEG compression is lossy and a critical consequence of this - I think - is (repeatable) quantization noise. But unless the exported images are being used for downstream numerical analysis (see above) then does it matter? The original image is - of course and importantly - not being modified. Having the output as PNG when the input is JPEG - or vice versa - in fact I think hinders further analysis/comparison. For example, what tool/s would allow easy overlay of such images - GIMP (to be discouraged I imagine!)..?

@GuidoBartoli Let me know your thoughts - I'd be happy to help work on the above if you think any of it would be useful (see also #29).

And just to reiterate that Sherloq is unique, excellent and of great value!

GuidoBartoli commented 3 years ago

This builds on #28 but is a wider feature request that I would like to motivate for here. @GuidoBartoli Please shout where you agree and disagree below.

Ok, let's take a look at it... :wink:

Sherloq provides excellent tools for visual analysis and it does this job well.

Numerical analysis is an important component of forensic analysis. This could be used, for example, to build a tampering detector that runs blind without human intervention. This could be incorporated but may be out of scope for Sherloq's intended purpose. Broadcasting to another python instance could also be helpful but probably hard to implement.

Creating a fully automated tool that can tell if an image has been modified in any way from the original has always been a goal of forensic researchers.

Unfortunately, many studies show that this is a very difficult task, because there are so many factors that come into play and the decision that a human (expert) can make must take into account both the result of multiple tests (at the pixel and metadata level), and the experience gained by examining numerous images to understand the effect of each type of modification.

My former professor and mentor wrote a paper on this and also the famous forensic expert Hany Farid a long time ago published an online service and a dedicated app that merged the result of various analyses to give an overall assessment of authenticity of the image, but this service (whose name I can't remember now) was soon shut down because it was of little use and unreliable.

Any experienced forensic researcher (like Neal Krawetz) can tell you that the best way is first to build a solid theoretical base on how "attacks" work and the algorithms that can detect them, then to experiment with as many techniques as possible to formulate hypothesis about something that is actually unknown a priori.

There are many good tools for such analysis. One example from my previous field of astronomy(*) is saoimage ds9 (https://sites.google.com/cfa.harvard.edu/saoimageds9). It is well established and contains a rich feature set (too many to mention but:) including,

  • Pan Zoom Rotate

I think the current image viewing and navigation system in Sherloq is efficient enough for visual investigation (have you noticed that double-clicking the mouse can quickly switch between 100% zoom and resized view?). Personally, I don't think image rotation will help for forensic analysis purposes.

  • Import/export
  • Python integration/broadcasting

That is useful and integrates with your proposal, I also think that a more functional export system can be a really nice addition.

  • Alignment of tiles
  • Contouring
  • Overlaying of regions

Honestly, I don't know the purpose of these... what are they for?

  • A full suite of colo(u)r schemes / rescaling / equalization
  • Histogram cuts

Equalization and histogram cuts are already implemented in Sherloq, while I think rescaling is not useful in this scenario, it introduces unwanted sampling artifacts.

  • Continually updated box containing coordinates, values, ...

This is already planned, you can see it in the Projects items, but I had no time to implement it, yet...

  • Support for a range of image formats (but mainly astronomy focused)

I have no experience with image formats for astrophotography, however if there are open libraries able to decode them we can think of adding them to the list.

(*) Probably every field has its favourite tools, however; I would be loathe to lock Sherloq into any particular field's toolset, awesome as ds9 is.

I imagine some of these would be either (i) straightforward or (ii) useful to add to Sherloq. But for those that are neither why reinvent the wheel?

A quick-kill route to accessing the above functionality - in addition to the existing export capability(+) - would be a comprehensive export button geared towards numerical analysis and development of blind, automated pipelines. This could simply be export to .npy (for all images) or .pkl for other variables and would afford flexibility by avoiding field lock-in. Examples of other possible variables might be

  • Algorithm settings (which would allow for reproducibility - since these could then in principle be loaded back in for reruns)
  • Keypoints (these can in fact be serialized into .pkl - I have functions for this; also for reload)
  • Contour sets (format?)
  • EXIF (to e.g. CSV/JSON)
  • Thumbnails
  • Sherloq hallmark to ensure and validate integrity of pipeline by, for example, commit versioning (though could this be hacked/spoofed?)

You can already export EXIF information and thumbnail images from Sherloq. Keypoints can be exported as an image with overlaid points, but there can be an option to export a textual version. I think saving algorithm settings would be not so useful, you can simply take a screenshot of the tool window where all controls specify the actual parameters.

(+) I take the point about sticking to PNG for export because it is lossless. Of course JPEG compression is lossy and a critical consequence of this - I think - is (repeatable) quantization noise. But unless the exported images are being used for downstream numerical analysis (see above) then does it matter? The original image is - of course and importantly - not being modified. Having the output as PNG when the input is JPEG - or vice versa - in fact I think hinders further analysis/comparison. For example, what tool/s would allow easy overlay of such images - GIMP (to be discouraged I imagine!)..?

@GuidoBartoli Let me know your thoughts - I'd be happy to help work on the above if you think any of it would be useful (see also #29).

Here are my thoughts: as said before, an improvement in the export is definitely to be done, I will start working on it next week, I thank you for your suggestions in this, Python format is surely a nice idea!

And just to reiterate that Sherloq is unique, excellent and of great value!

Thank you again, sir! :smile:

GuidoBartoli commented 3 years ago

Added JPEG format in Export dialog in commit ca726d04c8b927e85ed7d197808bc2ee92be1a60