Speedup Comprint & Noiseprint Process

IDLabMedia / comprint

Comprint: Image Forgery Detection and Localization using Compression Fingerprints

Other

17 stars 6 forks source link

Speedup Comprint & Noiseprint Process #2

Closed Mayur28 closed 1 year ago

Mayur28 commented 1 year ago

Hi!

Thanks for publicly releasing your fantastic solution!

I am trying to compute the Comprint and Noiseprint maps for a large directory of images (300,000+), but I noticed that this process is extremely slow (about 8-10 seconds per image). I presume that this is primarily attributed to having to load a different NoisePrint model for each image that has a different QF. I just wanted to find out whether you are aware of any way that could perhaps help speed up this process please.

Any assistance would be highly appreciated.

Thanks!

hmareen commented 1 year ago

Hi @Mayur28,

Happy to hear that you are using our repository!

The fingerprint extraction process (the model.predict() part here) of comprint and noiseprint should typically be quite fast on a GPU, in the order of a few seconds per image, depending on the resolution. What takes the most time is the conversion from fingerprint to heatmap (the getSpamFromNoiseprint() and EMgu_img() parts), which runs entirely on the CPU.

You could optimize this code by doing the fingerprint extraction on GPU and saving the result as .npz file. Then, multiple heatmap extractions can be done in parallel, multi-threaded, by reading these files and running the getSpamFromNoiseprint() and EMgu_img(). The code needs to be reorganized for this, though.

Loading a new noiseprint model for each image can indeed also slow down the process, although I believe the bulk of the time goes to the CPU-based heatmap extraction. In any case, you could group the image dataset per QF, only load each Noiseprint model for each QF once, and then run all the images with that QF on it. Again, the code will need to be reorganized for this, though.

A last possible optimization step is to optimize the code such that images are read in memory in a multi-threaded fashion, and are immediately available to be processed by the GPU, as soon as it's done with the previous image. Again, the code needs to be reorganized for this, though.

If you want to reorganize the code and there are any unclarities or problems, feel free to reach out (can also be by e-mail, see my profile). I may be able to give you some pointers.

Mayur28 commented 1 year ago

Hi @hmareen ,

Thanks for your detailed and insightful response.

Unfortunately, I had to resort to using the code as is and just waiting the process out. I agree with all your recommendations; should I have the opportunity to re-write the repo for improved efficiency, I will do so and keep in touch with you.

Thanks!

Mayur28 commented 1 year ago

Hi @hmareen ,

Hope you're well.

I just wanted to get some clarity with regards to how to optimally use Comprint (and NoisePrint) for splicing localization please. At present, I am using the Comprint and NoisePrint features (as shown below which I believe are referred to as fingerprints) along with other features as input to my splice detection solution.

As mentioned previously, I am experiencing major performance issues. Upon delving deeper into the code, I noticed that the bottlenecks that you mentioned in your previous post are subsequent to the Comprint and NoisePrint fingerprints being computed and saved. Considering that I am only using the Comprint and NoisePrint fingerprint (which I believe is the approach detailed in the video demo), I just wanted to find out what does the heatmap represent, and when/why should it be used. If I understand correctly, is the heatmap simply your approach to using the 2 fingerprints to perform splice detection? I am struggling to understand what is the purpose of the heatmap, and just wanted to find out whether I can do without it or do you think it is important for me to use the heatmap as well?

If you could please assist in this regard, it would be highly appreciated.

Thanks!

hmareen commented 1 year ago

Hi @Mayur28,

Let me try to explain the difference between the fingerprint and the heatmap.

The fingerprint is the noise that you see in the images that you posted. The fingerprint/noise should be significantly different when different sources are used. For the comprint (=compression fingerprint), these are different if a different compression history was used in the splice. For the noiseprint (=camera model fingerprint), these are different when a different camera model was used in the splice. So, the fingerprints hold very valuable information for splice detection.

However, the fingerprints may be hard to interpret as a human: it is not always so clear as in the images you posted. Additionally, how should a computer interpret the fingerprint? If we want to evaluate the method, we need a heatmap that gives the probability of manipulation for each pixel. That's why we transform the fingerprint to a heatmap. We do this by clustering the fingerprint in 2 clusters, using the Splicebuster algorithm. In more detail, so-called "SPAM" features are extracted from the fingerprint, and then these features are clustered with the EM algorithm. Then, for each pixel, we get the probability that it belongs to one of these 2 clusters. In our method, we assume that one cluster should be the one with "real" pixels, and the other with "manipulated" pixels.

The fingerprint is extracted using a CNN, which can run efficiently on the GPU. In contrast, the clustering to create the heatmap runs slowly on the CPU, and is indeed the main bottleneck.

If you are developing a new splice detection method, using only the fingerprints may be sufficient. Potentially, you could extract the SPAM-features from the fingerprint and use them as additional features as well. It depends a lot on how good the architecture of your method is. Can your method learn to do a better job than Splicebuster (extracting the SPAM features and clustering using EM)?

Now, how do this in practice? How do you disable the heatmap extraction, and only do the fingerprint extraction? You are interested only in the "res" variable, here for Noiseprint, and here for Comprint. Or, for Comprint, the normalized res-array that we eventually used may be better. After that, the Splicebuster algorithm is performed in the noiseprint_blind_post_concat function. First, the SPAM features are extracted (which, I believe, is relatively fast). Then, these are clustered using EM (which is the slowest part).

So, depending on your needs, you could only return the fingerprint (res variable), or also the SPAM features (spam variable).

Also note that the res-array is saved to a png file here. It's important to know that this visualization may be less accurate: the floating-point fingerprint is mapped using a certain colormap and saved as png, i.e. as an int8. In contrast, you could also save the fingerprint to a npz-file and use that as a feature, as it will be a more accurate representation. However, using npz files will take more storage and will probably loaded slower if you use them later to train a new algorithm. So there is a trade-off to be made here.

I understand this may be a lot of information to grasp. I am open to explain this further in a video call, if desired. Contact me by e-mail to arrange this (my e-mail address is on my GitHub profile).

Mayur28 commented 1 year ago

Thanks @hmareen for your incredibly speedy response! This certainly clarifies all of my uncertainties.

My solution uses NoisePrint and Comprint fingerprints in conjunction with other (handcrafted) features. These features are then passed as input to my deep-learning solution and I noticed that all features seem to complement each other really well. My solution then produces a localization map indicating which regions within the image has potentially been spliced (the splice map below is the result produced for the corresponding input image). RGB my_map

According to my understanding, the heatmap that is produced is an attempt to perform splice detection, but since my solution is also attempting to perform splice detection, I would not necessarily need to use the produced heatmap and instead, it's the fingerprints that are important (which I'm presently using).

Understood, thank you! I have amended the code to only compute the fingerprints and not the heatmap and this has significantly reduced the runtime. At the moment, I am attempting to convert the Comprint model to ONNX to speed up inference (and to prevent the model from having to be reloaded for each request within a Flask API). Unfortunately, TensorFlow/Keras models do not work well within Flask APIs. At the moment, I am currently experiencing difficulty with this, but I would like to first spend some time figuring this out and should I not succeed, I will contact you.

Thank you for all your assistance.

Mayur28 commented 1 year ago

The issue that I'm having at the moment is that, when loading the Comprint model (which expects an input with shape (None, 48, 48, 1) according to the inputs specified in model.dict (specifically model.inputs and model.saved_model_inputs_spec)), I see that we are still able to pass images with any shape as input (as long as eager execution is disabled). I am not sure how/why this is possible? Unfortunately, I am not able to examine the internals of the model to see if there is any resizing happening. Do you perhaps know why an image with any dimensions can be passed as input when the model specifically expects inputs with size (None, 48, 48, 1)?

To illustrate this, the following code snippet:

from tensorflow import keras as ks
import numpy as np
from splicebuster.noiseprint.utility.utilityRead import imread2f
import tensorflow as tf
tf.compat.v1.disable_eager_execution()

model = ks.models.load_model('models/Comprint_Siamese_Full_jpg_ps_full')
print(model.inputs)
print(model._feed_input_shapes)
filename = "splicing-07.jpg"
img, mode = imread2f(filename, channel=1)
img_in = (np.reshape(img, (1,img.shape[0],img.shape[1],1))*256 - 127.5) * 1./255
print("Input Shape: {}".format(img_in.shape))
res = np.reshape(model.predict(img_in), (img.shape[0], img.shape[1]))
print("Output Shape: {}".format(res.shape))

returns:

[<tf.Tensor 'input_1:0' shape=(None, 48, 48, 1) dtype=float32>]
[(None, 48, 48, 1)]
Input Shape: (1, 1536, 2048, 1)
Output Shape: (1536, 2048)

Thanks.

hmareen commented 1 year ago

I have also stumbled upon this issue before. The problem was that, during training, we used the following line: model.build((BATCH, 48, 48, 1)) We overlooked this during training, and it would have been better to use [None, None, None, 1], instead.

Anyway, using Tensorflow/Keras, you should be able to circumvent this issue by doing the following:

model = ks.models.load_model('models/Comprint_Siamese_Full_jpg_ps_full', compile=False)
model.build([None, None, None, 1])

Then later the fingerprint should not be called using model.predict(), but directly using model():

fingerprint = model(tf.constant(img_in))[0]
fingerprint = fingerprint.numpy()

Mayur28 commented 1 year ago

Thank you for your response.

I have just tried the recommended solution, but I get the same error as before i.e.,:

ValueError: Could not find matching concrete function to call loaded from the SavedModel. Got:
  Positional arguments (2 total):
    * <tf.Tensor 'inputs:0' shape=(1, 1536, 2048, 1) dtype=float32>
    * True
  Keyword arguments: {}

 Expected these arguments to match one of the following 4 option(s):

Option 1:
  Positional arguments (2 total):
    * TensorSpec(shape=(None, 48, 48, 1), dtype=tf.float32, name='inputs')
  Keyword arguments: {}

Option 3:
  Positional arguments (2 total):
    * TensorSpec(shape=(None, 48, 48, 1), dtype=tf.float32, name='input_1')
    * False
  Keyword arguments: {}

Option 4:
  Positional arguments (2 total):
    * TensorSpec(shape=(None, 48, 48, 1), dtype=tf.float32, name='input_1')
    * True
  Keyword arguments: {}

hmareen commented 1 year ago

I understand your pain, I went through the same type of error messages before. This is an alternative solution that worked in another scenario:

import network

model = network.Siamese_Network()
model.build([None, None, None, 1])
model.load_weights('models/Comprint_Siamese_Full_jpg_ps_full')

Mayur28 commented 1 year ago

No worries, thank you for all your help. I will try to find a workaround and will let you know if/when I succeed.

Mayur28 commented 1 year ago

Hi @hmareen ,

I have been experimenting with Comprint quite extensively over the past few days, unfortunately, I have not been able to resolve my previous issue (regarding model loading with variable input size) yet. I have now come across another question that I was hoping you would clarify please.

I noticed that the Comprint and NoisePrint is computed on the original image i.e., with original image size, and the output produced also matches that of the input image. Oddly however, I noticed that even though the result (with equal size as the input image) is passed to matplotlib for saving, the saved fingerprints are not the original size. I presume that this relates to dpi=250. I just wanted to find out why is this the case, and what issues would arise if the dpi is not explicitly specified and we let the fingerprints be saved with the shape equal to the original image? Would this not produce a clearer fingerprint size the image would not be implicitly shrunk or stretched?

Thanks.

hmareen commented 1 year ago

Dear Mayur,

What version of tensorflow are you using?

In tensorflow 2.12.0, the code of my previous reply (also see below) works for me and is operational in the COM-PRESS dashboard:

import network

model = network.Siamese_Network()
model.build([None, None, None, 1])
model.load_weights('models/Comprint_Siamese_Full_jpg_ps_full')

Regarding the saving of fingerprints as png: I have only used these for visualization purposes, such that the resolution did not matter that much. If you want to use them in your pipeline, it would indeed be better to make sure they get written to the same resolution. You can do this using plt.imsave, instead of plt.imshow and plt.savefig:

plt.imsave(output_file_fingerprint_path, fingerprint.clip(vmin,vmax), vmin=vmin, vmax=vmax, cmap='tab20b', format='png')
plt.imsave(output_file_fingerprint_path, fingerprint, cmap='tab20b', format='png')
plt.imsave(output_image_path, heatmap, vmin=vmin, vmax=vmax, cmap='jet', format='png')

Mayur28 commented 1 year ago

Hi @hmareen,

Thank you for your response. I noticed that with the recommended approach, I was indeed able to pass inputs of various sizes to the model, however, the difficulty that I experienced was to use the provided Comprint model and convert it to ONNX. For some reason, presumably due to variables stored within the checkpoint (and regardless of overwriting the input requirements), I was unable to do so as it kept requesting for an input with size (None, 48, 48, 1). Fortunately, I have managed to resolve the issue this morning and have successfully converted the model to ONNX by modifying the first layer of the model and subsequently rebuilding the model.

I am using Tensorflow 2.9.1.

Understood, I will do that. Considering that I am using the 2 fingerprints as inputs to my model, I'm hoping that this will improve the performance in terms of accuracy because of more clear/descriptive information being passed.

Mayur28 commented 1 year ago

Hi @hmareen,

Hope you have been keeping well.

I have being experimenting with Comprint again recently; In my use-case, I'm having to execute on the CPU instead of on a GPU which expectedly increases the inference latency considerably. Generally speaking (regardless of whether a CPU or GPU is used), my understanding is that Comprint needs to be computed on the original image i.e., it cannot be computed on a resized version of the image as this may distort the underlying noise and signal patterns. I have empirically tested this hypothesis as well. Unfortunately, as a consequence, inference takes very long for large images (3000x4000) and I wonder how else this can be reduced, considering that the image cannot be resized? I just wanted to check if you have any others ideas that could be used to compute the Comprint fingerprint please. Please note that I have already removed all code pertaining to computing the splicing heatmap, thus, saving on runtime.

Thanks!

hmareen commented 1 year ago

Hi again! Happy to hear you are still experimenting with Comprint. I'm glad to help out where I can.

It is indeed true that Comprint should be computed on the original image. Resizing is expected to decrease the performance. It is also no surprise that inference on high-resolution images takes a long time, especially on CPU.

I have a suggestion, but I am not sure if it will significantly speed-up the results. You could use a image slicing/tiling approach, by running inference on overlapping tiles of lower resolution. The noiseprint code in this repository already does this, whereas the comprint code does not.

In another project, I have adapted the slicing/tiling code of noiseprint such that it works with comprint (or other tf2 models). Here is the code:

def comprint_tiled(img, model, slide=1024):
    overlap = 34
    largeLimit = slide*slide + 1
    if img.shape[0] * img.shape[1] > largeLimit:
        #print('%dx%d large' % (img.shape[0], img.shape[1]))
        res = np.zeros((img.shape[0],img.shape[1]), np.float32)
        for index0 in range(0, img.shape[0], slide):
            index0start = index0 - overlap
            index0end   = index0 + slide + overlap

            for index1 in range(0, img.shape[1], slide):
                index1start = index1 - overlap
                index1end   = index1 + slide + overlap
                clip = img[max(index0start, 0): min(index0end,  img.shape[0]), max(index1start, 0): min(index1end,  img.shape[1])]
                resB = model.predict(clip[None, ..., None])
                resB = np.squeeze(resB)

                if index0 > 0:
                    resB = resB[overlap:, :]
                if index1 > 0:
                    resB = resB[:, overlap:]
                resB = resB[:min(slide, resB.shape[0]), :min(slide, resB.shape[1])]

                res[index0: min(index0+slide, res.shape[0]), index1: min(index1+slide, res.shape[1])] = resB
    else:
        #print('%dx%d small' % (img.shape[0], img.shape[1]))
        # Add batch dimension and channels dimension
        img = img[None, ...,None]
        res = model.predict(img)
        res = np.squeeze(res)
    return res

I am interested to hear if this has an effect on the speed, so it would be nice if you could later respond with your experience. :) Hope this helps!

Mayur28 commented 1 year ago

Thank you for the clarification, and I am highly appreciative of your suggestion (and code snippet), I will try it out as soon as possible and will keep you updated!