balakg / posewarp-cvpr2018

MIT License
193 stars 47 forks source link

How does one use the trained weights on a test image #8

Closed SimRunBot closed 6 years ago

SimRunBot commented 6 years ago

Hello everyone,

Thanks for the amazing research, I already learned a lot by reading the paper and related work. As far as I am understanding the code here, it is all about training the network to posewarp. But how do you actually apply it on an image?

best regards

balakg commented 6 years ago

Just take the model, and call model.predict on a new image. The first output of hte model is the predicted image.

gursimarsingh commented 6 years ago

I believe the inference image along with source and target pose should be used as argument in model.predict(). The exact format of the input can be generated after modifying the data_generation script for inference.

SimRunBot commented 6 years ago

I believe the inference image along with source and target pose should be used as argument in model.predict(). The exact format of the input can be generated after modifying the data_generation script for inference.

Right now I understand it like this: The network takes as input 5 parameters (x_src, x_pose_src, x_pose_tgt, x_mask_src, x_trans), which are generated by calling test_feed = data_generation.create_feed(params, params['data_dir'], 'train') . In case of"warp_example_generatorthe function yields (out, y) as a generator (=the python term not from GAN). I can access that via x, y = next(test_feed) and then pass it to results = model.predict(x) .

results is of shape (4, 256, 256, 3) . I have yet to figure out how to convert the resulting images to RGB

gursimarsingh commented 6 years ago
  1. The result is (4,256,256,3) that means the batch size is 4 and therefore each 256x256x3 is an output image, you just need to slice it.
  2. The range of the output must be [-1,1] as the model has a tanh activation at the end for target foreground. In the data_generation script the input image and ground truth is converted to be in the range of [-1,1] from [0,255] using the following transformation: I0 = (I0 / 255.0 - 0.5) * 2.0 So, I believe just if you reverse this transformation I0 = (I0/2.0 + 0.5)*255.0 you will get the output to be in the range of [0,255], which will be a 'uint8' RGB image.
SimRunBot commented 6 years ago
1. The result is (4,256,256,3) that means the batch size is 4 and therefore each 256x256x3 is an output image, you just need to slice it.

2. The range of the output must be [-1,1] as the model has a tanh activation at the end for target foreground. In the data_generation script the input image and ground truth is converted to be in the range of [-1,1] from [0,255] using the following transformation:
   `I0 = (I0 / 255.0 - 0.5) * 2.0`
   So, I believe just if you reverse this transformation `I0 = (I0/2.0 + 0.5)*255.0` you will get the output to be in the range of [0,255], which will be a 'uint8' RGB image.

thank you that color space transformation helped me !