Feynman1999 / MAI-VSR-Diggers

Diggers solution of Mobile AI 2021 Real-Time Video Super-Resolution Challenge
27 stars 5 forks source link

How do you use tflite model for camera video frames? #2

Closed uu95 closed 2 years ago

uu95 commented 2 years ago

Hello,

First, I would like to appreciate your work.

Actually, I have been trying to implement this model for my custom project. I have tried to obtain the single frame result from model_none.tflite, but it takes 10 frames, which I did. However, the result does not look good. can you tell me why? download

I have used the following:

  1. For stacking all frames in the color channel which gives (1, 180, 320, 300)
    
    all_frames = sorted(glob.glob('/content/drive/MyDrive/Mobile_communication/val_sharp_bicubic/X4/000/*.png'))

read_1 = tf.io.read_file(all_frames[0]) read_1 = tf.image.decode_jpeg(read_1, channels=3) stacked = tf.Variable(np.empty((1,read_1.shape[0],read_1.shape[1],read_1.shape[2]), dtype=np.float32))

for ind in range(1):

for ind in all_frames: test_img_path = ind lr1 = tf.io.read_file(test_img_path) lr = tf.image.decode_jpeg(lr1, channels=3) lr = tf.expand_dims(lr, axis=0) lr = tf.cast(lr, tf.float32) stacked = tf.concat([stacked, lr], axis=-1) stacked = stacked[:,:,:,3:] print(stacked.shape)

2. Giving 10 frames for VSR by your tflite model which gives (720, 1280, 30)

frames_10 = stacked[:,:,:,:30] vsr_model_path = './MAI-VSR-Diggers/ckpt/model_none.tflite'

vsr_model_path = './MAI-VSR-Diggers/ckpt/model.tflite'

Load TFLite model and allocate tensors.

interpreter = tf.lite.Interpreter(model_path=vsr_model_path)

Get input and output tensors.

input_details = interpreter.get_input_details() output_details = interpreter.get_output_details() print(input_details, '\n',output_details)

Run the model

interpreter.resize_tensor_input(input_details[0]['index'], [1, 180, 320, 30], strict=False) interpreter.allocate_tensors() interpreter.set_tensor(input_details[0]['index'], in_frames) interpreter.invoke()

Extract the output and postprocess it

output_data = interpreter.get_tensor(output_details[0]['index']) vsr = tf.squeeze(output_data, axis=0) print(vsr.shape)

3. Display 1 frame of SR

frame_1 = stacked[:,:,:,:3]

lr = tf.squeeze(frame_1, axis=0)

lr = tf.cast(tf.squeeze(frame_1, axis=0), tf.uint8) print(lr.shape)

Image.fromarray(np.asarray(lr)).show()

plt.figure(figsize = (5,6)) plt.title('LR') plt.imshow(lr.numpy());

tensor = vsr[:,:,:3] shape = tensor.shape image_scaled = minmax_scale(tf.reshape(tensor,shape=[-1]), feature_range=(0,255)).reshape(shape) tensor = tensor/255 print(tensor.shape) plt.figure(figsize=(25, 15)) plt.subplot(1, 2, 1)
plt.title(f'VSR (x4)') plt.imshow(tensor.numpy());

bicubic = tf.image.resize(lr, [720, 1280], tf.image.ResizeMethod.BICUBIC) bicubic = tf.cast(bicubic, tf.uint8) plt.subplot(1, 2, 2)
plt.title('Bicubic') plt.imshow(bicubic.numpy());

Feynman1999 commented 2 years ago

thank you for your attention! I will check your code soon and reply to you

uu95 @.***> 于2021年12月6日周一 19:29写道:

Hello,

First, I would like to appreciate your work.

Actually, I have been trying to implement this model for my custom project. I have tried to obtain the single frame result from model_none.tflite, but it takes 10 frames, which I did. However, the result does not look good. can you tell me why? [image: download] https://user-images.githubusercontent.com/94444066/144837147-f82758f8-2d16-4f71-b6cd-6383dfe71884.png

I have used the following:

  1. For stacking all frames in the color channel which gives (1, 180, 320, 300)

all_frames = sorted(glob.glob('/content/drive/MyDrive/Mobile_communication/val_sharp_bicubic/X4/000/*.png'))

read_1 = tf.io.read_file(all_frames[0]) read_1 = tf.image.decode_jpeg(read_1, channels=3) stacked = tf.Variable(np.empty((1,read_1.shape[0],read_1.shape[1],read_1.shape[2]), dtype=np.float32))

for ind in range(1):

for ind in all_frames: test_img_path = ind lr1 = tf.io.read_file(test_img_path) lr = tf.image.decode_jpeg(lr1, channels=3) lr = tf.expand_dims(lr, axis=0) lr = tf.cast(lr, tf.float32) stacked = tf.concat([stacked, lr], axis=-1) stacked = stacked[:,:,:,3:] print(stacked.shape)

  1. Giving 10 frames for VSR by your tflite model which gives (720, 1280, 30)

frames_10 = stacked[:,:,:,:30] vsr_model_path = './MAI-VSR-Diggers/ckpt/model_none.tflite'

vsr_model_path = './MAI-VSR-Diggers/ckpt/model.tflite'

Load TFLite model and allocate tensors.

interpreter = tf.lite.Interpreter(model_path=vsr_model_path)

Get input and output tensors.

input_details = interpreter.get_input_details() output_details = interpreter.get_output_details() print(input_details, '\n',output_details)

Run the model

interpreter.resize_tensor_input(input_details[0]['index'], [1, 180, 320, 30], strict=False) interpreter.allocate_tensors() interpreter.set_tensor(input_details[0]['index'], in_frames) interpreter.invoke()

Extract the output and postprocess it

output_data = interpreter.get_tensor(output_details[0]['index']) vsr = tf.squeeze(output_data, axis=0) print(vsr.shape)

  1. Display 1 frame of SR

frame_1 = stacked[:,:,:,:3]

lr = tf.squeeze(frame_1, axis=0)

lr = tf.cast(tf.squeeze(frame_1, axis=0), tf.uint8) print(lr.shape)

Image.fromarray(np.asarray(lr)).show()

plt.figure(figsize = (5,6)) plt.title('LR') plt.imshow(lr.numpy());

tensor = vsr[:,:,:3] shape = tensor.shape image_scaled = minmax_scale(tf.reshape(tensor,shape=[-1]), feature_range=(0,255)).reshape(shape) tensor = tensor/255 print(tensor.shape) plt.figure(figsize=(25, 15)) plt.subplot(1, 2, 1) plt.title(f'VSR (x4)') plt.imshow(tensor.numpy());

bicubic = tf.image.resize(lr, [720, 1280], tf.image.ResizeMethod.BICUBIC) bicubic = tf.cast(bicubic, tf.uint8) plt.subplot(1, 2, 2) plt.title('Bicubic') plt.imshow(bicubic.numpy());

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Feynman1999/MAI-VSR-Diggers/issues/2, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGC5WRETIPT5DJ4XTQKQI3TUPSNAVANCNFSM5JOMHNBA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

uu95 commented 2 years ago

thank you so much, I have a deadline of 10 December, if possible please update me as early as possible. Sorry for asking so much.

uu95 commented 2 years ago

I am sorry for asking again in a short time notice, any update?

Feynman1999 commented 2 years ago

here is a demo for testing with just one img(broadcast to 10 for tflite), you can try it in your machine

# 180*320 img

import tensorflow as tf
import numpy as np
import cv2

read_1 = cv2.imread("test.png")
read_1= tf.convert_to_tensor(read_1)
read_1 = tf.cast(read_1, tf.float32)

read_1 = read_1 / 255. # 0~1

# 10 frames
inputs = tf.concat([read_1 for i in range(10)], axis=-1)
inputs = tf.expand_dims(inputs, axis=0)
print(inputs.shape)https://ibb.co/wKFtn18

vsr_model_path = './model_none.tflite'

interpreter = tf.lite.Interpreter(model_path=vsr_model_path)

input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
print(input_details, '\n',output_details)

interpreter.resize_tensor_input(input_details[0]['index'], [1, 180, 320, 30], strict=False)
interpreter.allocate_tensors()
interpreter.set_tensor(input_details[0]['index'], inputs)
interpreter.invoke()

output_data = interpreter.get_tensor(output_details[0]['index'])

print(type(output_data))

output1 = np.clip(output_data[0, :, :, 0:3], a_min=0, a_max=1)

output1 = (output1*255).astype(np.uint8)

cv2.imwrite("res.png", output1)

input

https://ibb.co/wKFtn18

output

https://ibb.co/whvSdr4

Feynman1999 commented 2 years ago

i guess there is something wrong with the _tf.image.decodejpeg in your code? May lose some details?

By the way, you need do clip( min: 0, max:1 ) for output which is not done in tflite model

uu95 commented 2 years ago

Thank you so much, it works!!!

uu95 commented 2 years ago

well, there are 3 things different from me:

  1. normalization before input to model.
  2. stack the image 10 times for 10 frames, but think it is just for reference
  3. clipping ok, I got it. again thank you for being supportive.
Feynman1999 commented 2 years ago

well, there are 3 things different from me:

  1. normalization before input to model.
  2. stack the image 10 times for 10 frames, but think it is just for reference
  3. clipping ok, I got it. again thank you for being supportive.

yes, I repeat one frame 10 times for fast implementation and you can use consecutive 10 different frames in real world scene.

uu95 commented 2 years ago

ok, and one last thing why clipping is required?

uu95 commented 2 years ago

is it for the 0~255 range?

uu95 commented 2 years ago

oh, I see. It's for that range. sorry for my stupid question.

Feynman1999 commented 2 years ago

It's okay. Thank you for trying my code ☺