cxy1997 / LISO

Learning Iterative Neural Optimizers for Image Steganography
https://arxiv.org/abs/2303.16206
Other
17 stars 2 forks source link

Correct usage of model? #4

Open thomas-xin opened 5 days ago

thomas-xin commented 5 days ago

First of all, let me say that this is a really cool project!

I wanted to try and test inference on single files (to see if it can be integrated in a couple projects of mine) and while I was able to get an output that resembled both the cover and data inputs, I think I'm doing something wrong as the output is very colour distorted, and sometimes depending on the content of the image (I made sure to keep image size consistent) it sometimes gives the following error or a variant of it in the encoder.forward -> conv2d step: RuntimeError: Given groups=1, weight of size [32, 33, 3, 3], expected input[1, 35, 512, 512] to have 33 channels, but got 35 channels instead

Here's the code I used to inference the model:

import numpy as np
from PIL import Image
import torch
import torchvision.transforms as transforms
import liso, liso.encoders, liso.decoders, liso.models

dtype = torch.float32
model = liso.models.LISO.load("checkpoints/div2k_jpeg/1_bits.steg")
model.encoder = model.encoder.to(dtype)
model.decoder = model.decoder.to(dtype)
if model.critic:
    model.critic = model.critic.to(dtype)
model.dtype = dtype
model.encoder.constraint = None

size = (512, 512)
im = Image.open(cover_image).resize(size, resample=Image.Resampling.LANCZOS)
da = Image.open(data_image).resize(size, resample=Image.Resampling.LANCZOS)
imt = transforms.ToTensor()(np.asanyarray(im)).unsqueeze(0).to(model.device).to(model.dtype)
dat = transforms.ToTensor()(np.asanyarray(da)).unsqueeze(0).to(model.device).to(model.dtype)

with torch.no_grad():
    resp = model.encoder(imt, dat)

im = transforms.ToPILImage()(resp[0][0].squeeze(0))
print(im)
im.save("test.png")

Let me know if I should be doing something different here. Thanks!

thomas-xin commented 5 days ago

Update: I have figured out how to run the model without errors (the input images need to have the same number of channels as bits per pixel, the images should be preprocessed using liso.loader.EVAL_TRANSFORM and postprocessed using liso.utils.to_np_img), which fixes the errors, but the colours still act quite strange.

I also tried some of the sample eval arguments on the provided dataset and received the following error in the structural_similarity function: ValueError: win_size exceeds image extent. Either ensure that your images are at least 7x7; or pass win_size explicitly in the function call, with an odd value less than or equal to the smaller side of your images. If your images are multichannel (with color channels), set channel_axis to the axis number corresponding to the channels.

cxy1997 commented 4 days ago

Hi Thomas,

Thank you for your interest in our work. You can refer to this Colab notebook for running model inference. Please feel free to reach out if you have any further questions.