PeterL1n / RobustVideoMatting

Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!
https://peterl1n.github.io/RobustVideoMatting/
GNU General Public License v3.0
8.32k stars 1.11k forks source link

Library improperly pulling together #233

Open skyler14 opened 1 year ago

skyler14 commented 1 year ago

I had a several month old image where I was running basic inferences successful, however I updated to the recent image via git pull and I've just been having a weird issue with the color channel seemingly being fed in where one of the other dimensions should.

The torch hub will work (very) but fine, however using the reference convert function leads to this error. I've added a print frame in the write function at

python .\inference.py --variant resnet50 --checkpoint .\rvm_resnet50.pth --input-source c:\Users\Skyler\Documents\gcp\vdio\w2lhq\videos\test1.mp4 --output-type video --device 'cuda' --output-alpha C:\Users\Skyler\Documents\vdio_test_vids\alpha_test.mp4 --output-composition C:\Users\Skyler\Documents\vdio_test_vids\com_test.mp4 --output-video-mbps 16

class VideoWriter:
    def __init__(self, path, frame_rate, bit_rate=1000000):
        self.container = av.open(path, mode='w')
        self.stream = self.container.add_stream('h264', rate=round(frame_rate))
        self.stream.pix_fmt = 'yuv420p'
        self.stream.bit_rate = bit_rate

    def write(self, frames):
        # frames: [T, C, H, W]
        self.stream.width = frames.size(3)
        self.stream.height = frames.size(2)
        print("Frames shape before permute:", frames.shape)
        if frames.size(1) == 1:
            frames = frames.repeat(1, 3, 1, 1) # convert grayscale to RGB
        frames = frames.mul(255).byte().cpu().permute(0, 2, 3, 1).numpy()
        for t in range(frames.shape[0]):
            frame = frames[t]
            frame = av.VideoFrame.from_ndarray(frame, format='rgb24')
            self.container.mux(self.stream.encode(frame))

and I see this error


File Not Found
  0%|                                                                                      | 0/199 [00:00<?, ?it/s]do I get to right before writing
Frames shape before permute: torch.Size([1, 1, 352, 640])
Frames shape before permute: torch.Size([1, 1, 3, 352, 640])
height not divisible by 2 (352x3)
Traceback (most recent call last):
  File ".\inference.py", line 182, in convert_video
    writer_com.write(com[0])
  File "C:\Users\Skyler\Documents\robust-matting\inference_utils.py", line 45, in write
    frames = frames.repeat(1, 3, 1, 1) # convert grayscale to RGB
RuntimeError: Number of dimensions of repeat dims can not be smaller than number of dimensions of tensor

During handling of the above exception, another exception occurred:
  File ".\inference.py", line 213, in convert
    convert_video(self.model, device=self.device, dtype=torch.float32, *args, **kwargs)
    writer_com.close()
  File "C:\Users\Skyler\Documents\robust-matting\inference_utils.py", line 106, in close
    self.remux_audio()
  File "C:\Users\Skyler\Documents\robust-matting\inference_utils.py", line 103, in remux_audio
    self.container.mux(packet)
  File "av\container\output.pyx", line 204, in av.container.output.OutputContainer.mux
  File "av\container\output.pyx", line 210, in av.container.output.OutputContainer.mux_one
  File "av\container\output.pyx", line 156, in av.container.output.OutputContainer.start_encoding
  File "av\codec\context.pyx", line 275, in av.codec.context.CodecContext.open
  File "av\error.pyx", line 336, in av.error.err_check
av.error.ExternalError: [Errno 542398533] Generic error in an external library; last error log: [libx264] height not divisible by 2 (352x3)
  0%|                                                                                      | 0/199 [00:01<?, ?it/s] 
(vdio_clone) PS C:\Users\Skyler\Documents\robust-matting> cd ..\RobustVideoMatting\
(vdio_clone) PS C:\Users\Skyler\Documents\RobustVideoMatting> git pull
Merge made by the 'ort' strategy.
 inference.py       | 91 ++++++++++++++++++++++++++++++++++++++++--------------
 inference_utils.py | 76 ++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 132 insertions(+), 35 deletions(-)
(vdio_clone) PS C:\Users\Skyler\Documents\RobustVideoMatting> python .\inference.py --variant resnet50 --checkpoint 
.\rvm_resnet50.pth --input-source  c:\Users\Skyler\Documents\gcp\vdio\w2lhq\videos\test1.mp4 --output-type video --device 'cuda' --output-alpha C:\Users\Skyler\Documents\vdio_test_vids\alpha_test.mp4 --output-composition C:\Users\Skyler\Documents\vdio_test_vids\com_test.mp4 --output-video-mbps 16
File Not Found
  0%|                                                                                      | 0/199 [00:00<?, ?it/s]height not divisible by 2 (352x3)
Traceback (most recent call last):
  File ".\inference.py", line 182, in convert_video
    writer_com.write(com[0])
  File "C:\Users\Skyler\Documents\RobustVideoMatting\inference_utils.py", line 44, in write
    frames = frames.repeat(1, 3, 1, 1) # convert grayscale to RGB
RuntimeError: Number of dimensions of repeat dims can not be smaller than number of dimensions of tensor

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File ".\inference.py", line 248, in <module>
    progress=not args.disable_progress
  File ".\inference.py", line 212, in convert
    convert_video(self.model, device=self.device, dtype=torch.float32, *args, **kwargs)
  File ".\inference.py", line 189, in convert_video
    writer_com.close()
  File "C:\Users\Skyler\Documents\RobustVideoMatting\inference_utils.py", line 105, in close
    self.remux_audio()
  File "C:\Users\Skyler\Documents\RobustVideoMatting\inference_utils.py", line 102, in remux_audio
    self.container.mux(packet)
  File "av\container\output.pyx", line 204, in av.container.output.OutputContainer.mux
  File "av\container\output.pyx", line 210, in av.container.output.OutputContainer.mux_one
  File "av\container\output.pyx", line 156, in av.container.output.OutputContainer.start_encoding
  File "av\codec\context.pyx", line 275, in av.codec.context.CodecContext.open
  File "av\error.pyx", line 336, in av.error.err_check
av.error.ExternalError: [Errno 542398533] Generic error in an external library; last error log: [libx264] height not divisible by 2 (352x3)
  0%|