Closed DanCorvesor closed 1 year ago
Hi @DanCorvesor Take a look here: https://github.com/NVIDIA/VideoProcessingFramework/blob/d8d5d1874c65ecfe6a82db2c282182e1b865452e/tests/test_PyNvDecoder.py#L201-L217
Hi @RomanArzumanyan . Thanks so much for getting back, it's really appreciated. This helped a lot - I was able to successfully match the timestamps that I get reading an mp4 video on CPU with PyAV which is what I want.
However, if I use your code in your Pytorch example to convert the color chain into the rgb format and then convert into a pytorch tensor (and then back to a numpy array) - the frames that I get are vastly different than those that I get back with PyAV on CPU. Am I missing something here - I presume I should be able to reproduce them? I can provide a code snippet and the video I am using if required?
Hi @DanCorvesor
Attaching a screenshot and color conversion code would be helpful.
VPF has pretty accurate yuv 2 rgb color conversion, we just need to make sure that proper color space and range are used.
Hi Roman,
Sorry for the delay, I have attached the video I have been testing on (sample-5s.mp4). And some example outputs below. The code used to generate this is provided: I think timestamps seem correct but I have an issue with the colour conversion as mentioned above. Let me know if you need anything else to investigate
Code:
import time
import os
# Starting from Python 3.8 DLL search policy has changed.
# We need to add path to CUDA DLLs explicitly.
import sys
if os.name == "nt":
# Add CUDA_PATH env variable
cuda_path = os.environ["CUDA_PATH"]
if cuda_path:
os.add_dll_directory(cuda_path)
else:
print("CUDA_PATH environment variable is not set.", file=sys.stderr)
print("Can't set CUDA DLLs search path.", file=sys.stderr)
exit(1)
# Add PATH as well for minor CUDA releases
sys_path = os.environ["PATH"]
if sys_path:
paths = sys_path.split(";")
for path in paths:
if os.path.isdir(path):
os.add_dll_directory(path)
else:
print("PATH environment variable is not set.", file=sys.stderr)
exit(1)
import av
import numpy as np
import matplotlib.pyplot as plt
import pycuda.driver as cuda
import PyNvCodec as nvc
import torch
try:
import PytorchNvCodec as pnvc
except ImportError as err:
raise (
f"""Could not import `PytorchNvCodec`: {err}.
Please make sure it is installed! Run
`pip install git+https://github.com/NVIDIA/VideoProcessingFramework#subdirectory=src/PytorchNvCodec` or
`pip install src/PytorchNvCodec` if using a local copy of the VideoProcessingFramework repository"""
) # noqa
import logging
import warnings
from math import floor, modf
class VideoReader(object):
"""
A help class to read video using PyAv
That library allows to read every frame from a video and read presentation timestamp (PTS) as described
https://en.wikipedia.org/wiki/Presentation_timestamp.
This is highly useful to accurately read videos that are recorded with variable frame rate.
Note: Video frames might not always be stored in chronological order.
This function will skip frames where their PTS is earlier than the previous frame
Parameter:
----------
video_file: str
video file path to be loaded
fps: int, optional
frame rate will be used to return video frames
debug: bool, optional
to run it in debug mode
See Also
--------
frame_iterator: returns a generator to iterate over all frames
"""
def __init__(self, video_file, fps: int = 1000, debug: bool = False):
if not debug:
av.logging.set_level(av.logging.CRITICAL)
self.video_file = video_file
self.debug = debug
self.requested_fps = fps
self._open_video_container()
if self.debug:
logging.info(self)
def _open_video_container(self):
self.return_ms = 1000.0 / self.requested_fps
self.container = av.open(self.video_file, metadata_errors="ignore")
self.last_returned = -1
self.last_returned_ms = -1
self.stream = self.container.streams.video[0]
self.stream.thread_type = "AUTO"
self.stream_itr = iter(self.container.decode(self.stream))
self.stream_fps = float(self.stream.rate) if self.stream.rate else None
self.display_aspect_ratio = self.stream.codec_context.display_aspect_ratio
if self.display_aspect_ratio:
self.frame_height = self.stream.codec_context.height
self.frame_width = int(self.frame_height * self.display_aspect_ratio)
if self.debug:
self.duration_second = float(self.stream.duration * self.stream.time_base)
self.num_frames = int(floor(self.duration_second) * min(self.requested_fps, self.stream_fps))
self.num_frames += floor(modf(self.duration_second)[0] * 1000 / self.return_ms) + 1
def seek(self, target_sec: float, any_frame: bool = True, backward: bool = True):
try:
target_time = int(target_sec / self.stream.time_base) + self.stream.start_time
except TypeError:
target_time = int(target_sec / self.stream.time_base)
self.stream.seek(target_time, any_frame=any_frame, backward=backward)
def reload(self):
self._open_video_container()
def close(self):
self.container = None
self.stream = None
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self.close()
def __str__(self):
return f"Loaded {self.video_file}\nRecorded at {self.stream_fps} FPS read at {1000. / self.return_ms} FPS"
def __iter__(self):
return self.frame_iterator()
def frame_iterator(self):
"""
Create a generator to go through video frames
Note
----
This function skips frames with invalid pts or invalid data
Returns
-------
3D Numpy array
3D numpy array of shape (height, width, 3) in rgb order
float
Presentation timestamp that is computed by using video and frame metadata
"""
while True:
try:
frame = next(self.stream_itr)
print(type(frame))
except av.InvalidDataError as err:
logging.info(err)
continue
except StopIteration:
if self.debug:
logging.info("End of the stream")
break
if frame.pts is None:
if self.debug:
warnings.warn("NO PTS: Frame skipped")
continue
time_ms = float(frame.pts * self.stream.time_base) * 1000
if self.should_return(time_ms):
if self.display_aspect_ratio:
yield frame.to_ndarray(width=self.frame_width, height=self.frame_height, format="rgb24"), time_ms
else:
yield np.array(frame.to_image()), time_ms
def should_return(self, ms):
if ms // self.return_ms > self.last_returned:
self.last_returned = ms // self.return_ms
self.last_returned_ms = ms
return True
elif self.debug:
logging.log(10, f"Skipping: {ms} as {self.last_returned_ms}")
return False
class cconverter:
"""
Colorspace conversion chain.
"""
def __init__(self, width: int, height: int, gpu_id: int):
self.gpu_id = gpu_id
self.w = width
self.h = height
self.chain = []
def add(self, src_fmt: nvc.PixelFormat, dst_fmt: nvc.PixelFormat) -> None:
self.chain.append(nvc.PySurfaceConverter(self.w, self.h, src_fmt, dst_fmt, self.gpu_id))
def run(self, src_surface: nvc.Surface) -> nvc.Surface:
surf = src_surface
cc = nvc.ColorspaceConversionContext(nvc.ColorSpace.BT_601, nvc.ColorRange.MPEG)
for cvt in self.chain:
surf = cvt.Execute(surf, cc)
if surf.Empty():
raise RuntimeError("Failed to perform color conversion")
return surf.Clone(self.gpu_id)
class VideoReaderVPF(object):
def __init__(self, gpu_id, enc_path):
self.gpu = gpu_id
self.cuda_ctx, self.cuda_str = self.initialise_cuda_ctx()
self.nvDmx = nvc.PyFFmpegDemuxer(enc_path, {})
self.width = self.nvDmx.Width()
self.height = self.nvDmx.Height()
self.nvDec = nvc.PyNvDecoder(
self.width, self.height, self.nvDmx.Format(), self.nvDmx.Codec(), self.cuda_ctx.handle,
self.cuda_str.handle,
)
self.to_rgb = self.initialise_colour_chain_converter()
self.decoded_frames = 0
def __iter__(self):
return self.frame_iterator()
def initialise_cuda_ctx(self):
cuda.init()
cuda_ctx = cuda.Device(self.gpu).retain_primary_context()
cuda_ctx.push()
cuda_str = cuda.Stream()
cuda_ctx.pop()
return cuda_ctx, cuda_str
@staticmethod
def surface_to_tensor(surface: nvc.Surface) -> torch.Tensor:
"""
Converts planar rgb surface to cuda float tensor.
"""
if surface.Format() != nvc.PixelFormat.RGB_PLANAR:
raise RuntimeError("Surface shall be of RGB_PLANAR pixel format")
surf_plane = surface.PlanePtr()
img_tensor = pnvc.DptrToTensor(
surf_plane.GpuMem(),
surf_plane.Width(),
surf_plane.Height(),
surf_plane.Pitch(),
surf_plane.ElemSize(),
)
if img_tensor is None:
raise RuntimeError("Can not export to tensor.")
img_tensor.resize_(3, int(surf_plane.Height() / 3), surf_plane.Width())
img_tensor = img_tensor.type(dtype=torch.cuda.FloatTensor)
img_tensor = torch.divide(img_tensor, 255.0)
img_tensor = torch.clamp(img_tensor, 0.0, 1.0)
return img_tensor
def initialise_colour_chain_converter(self) -> cconverter:
to_rgb = cconverter(self.width, self.height, self.gpu)
to_rgb.add(nvc.PixelFormat.NV12, nvc.PixelFormat.YUV420)
to_rgb.add(nvc.PixelFormat.YUV420, nvc.PixelFormat.RGB)
to_rgb.add(nvc.PixelFormat.RGB, nvc.PixelFormat.RGB_PLANAR)
return to_rgb
def convert_pts_to_ms(self, pts):
return float(pts * self.nvDmx.Timebase()) * 1000
def frame_iterator(self):
dec_frames = 0
packet = np.ndarray(shape=(0), dtype=np.uint8)
out_bst_size = 0
while self.nvDmx.DemuxSinglePacket(packet):
in_pdata = nvc.PacketData()
self.nvDmx.LastPacketData(in_pdata)
out_pdata = nvc.PacketData()
surf = self.nvDec.DecodeSurfaceFromPacket(in_pdata, packet, out_pdata)
if not surf.Empty():
dec_frames += 1
out_bst_size += out_pdata.bsl
timestamp = self.convert_pts_to_ms(out_pdata.pts)
# Convert to planar RGB
rgb_pln = self.to_rgb.run(surf)
src_tensor = self.surface_to_tensor(rgb_pln)
self.decoded_frames += 1
yield src_tensor, timestamp
while True:
out_pdata = nvc.PacketData()
surf = self.nvDec.FlushSingleSurface(out_pdata)
# print(out_pdata)
if not surf.Empty():
out_bst_size += out_pdata.bsl
timestamp = self.convert_pts_to_ms(out_pdata.pts)
rgb_pln = self.to_rgb.run(surf)
src_tensor = self.surface_to_tensor(rgb_pln)
self.decoded_frames += 1
yield src_tensor, timestamp
else:
break
def main(gpu, enc_path):
# Access gpu for first time outside of loop for fair comparison
torch.zeros(10, 10).cuda()
video_reader_timestamps = []
video_reader_frames = {}
start_video_reader = time.time()
video_reader = VideoReader(enc_path)
for frame, ms in video_reader:
video_reader_timestamps.append(round(ms, 5))
video_reader_frames[round(ms, 5)] = torch.from_numpy(frame).cuda()
print(f'Time taken for video reader streaming {time.time() - start_video_reader} seconds')
vpf_timestampsi = []
vpf_timestampso = []
vpf_frames = {}
start_vpf = time.time()
video_reader_vpf = VideoReaderVPF(gpu, enc_path)
for tensor, ms in video_reader_vpf:
round_ms = round(ms, 5)
vpf_frames[round_ms] = tensor
vpf_timestampso.append(round_ms)
print(len(vpf_timestampso))
print(f'Time taken for vpf streaming {time.time() - start_vpf} seconds')
print(video_reader_vpf.decoded_frames)
print(f'Number of vpf timestamps: {len(vpf_timestampso)}, number of video reader timestamps: \
{len(video_reader_timestamps)}')
print(vpf_timestampso == video_reader_timestamps)
diff = []
for element in vpf_timestampso:
if element not in video_reader_timestamps:
diff.append(element)
print(f'Timestamps that are different between vpf and the video reader (up to 5 decimal places {diff}')
for vid_reader_timestamp, vid_reader_frame in video_reader_frames.items():
f, axarr = plt.subplots(1, 3)
# permuted_frame = vid_reader_frame.permute(2, 0, 1).cpu()
axarr[0].imshow(vid_reader_frame.cpu())
axarr[0].set_title('PyAV', fontstyle='italic')
matching_vpf_frame = vpf_frames[vid_reader_timestamp].permute(1, 2, 0).cpu()
vid_reader_frame = vid_reader_frame.cpu()
axarr[1].imshow(matching_vpf_frame)
axarr[1].set_title('VPF', fontstyle='italic')
axarr[2].imshow(abs(vid_reader_frame - matching_vpf_frame))
axarr[2].set_title('Diff', fontstyle='italic')
print(np.histogram(abs(vid_reader_frame - matching_vpf_frame)))
f.savefig(f'vpf/outputs/{vid_reader_timestamp}.png')
plt.close()
if __name__ == "__main__":
print("This sample decodes input video to raw YUV420 file on given GPU.")
print("Usage: SampleDecode.py $gpu_id $input_file.")
if len(sys.argv) < 3:
print("Provide gpu ID, path to input file")
exit(1)
gpuID = int(sys.argv[1])
encFilePath = sys.argv[2]
main(gpuID, encFilePath)
![700 0](https://github.com/NVIDIA/VideoProcessingFramework/assets/44499515/9eca98ac-3b35-4abc-859 9-7082cb8b2cca)
Hi @DanCorvesor
You're converting YUV > RGB with hard-coded parameters:
def run(self, src_surface: nvc.Surface) -> nvc.Surface:
surf = src_surface
cc = nvc.ColorspaceConversionContext(nvc.ColorSpace.BT_601, nvc.ColorRange.MPEG)
Actual colorspace and color range may be different hence the difference between VPF and PyAV results.
You can get color conversion params using PyFfMpegDemuxer
class as it's shown here:
https://github.com/NVIDIA/VideoProcessingFramework/blob/d8d5d1874c65ecfe6a82db2c282182e1b865452e/tests/test_PyFfmpegDemuxer.py#L73-L77
Also please note that sometimes color space and color range information isn't present in video file, then you can only guess the actual values.
Hi @RomanArzumanyan
So in this case for this test video, I'm getting: ColorSpace.UNSPEC ColorRange.UDEF
But it says this is unsupported, what can I do in this case?
Also, related, in the case where you mention you need to guess the colour space/range info, is there any sensible algorithmic way to do that?
Hi @RomanArzumanyan, update I actually messed up I was comparing integer values to float values so the difference is very very close - which is great. Sorry for the messing around on this.
However, going back to your point that you made - given the colour spaces and ranges are unsupported, are the ones I specified originally good defaults (they are working in this case so seem to be). Is there a way I could check in code whether a colour space is supported when I initialise the colour converter class to check if the colour space/range inferred from the demuxer is supported and use these defaults if not?
Hi @DanCorvesor
Is there a way I could check in code whether a colour space is supported when I initialise the colour converter class to check if the colour space/range inferred from the demuxer is supported and use these defaults if not?
You can get the values with nvDmx.ColorSpace()
and nvDmx.ColorRange()
.
If they return ColorSpace.UNSPEC
and / or ColorRange.UDEF
then you can only guess or hard-code values.
Choosing different color conversion options won't crush you program, it will only affect the colors.
Basically that's what happening in your PyAV vs. VPF comparison test. PyAV just chooses different default options.
Since the are just 4 possible combinations (2 color space and 2 color range options) you can play around and see how simialr PyAV and VPF results are.
Thanks again @RomanArzumanyan last question (I hope) - you mentioned there are two default options for both. What are these, how can I see them?
Hi @DanCorvesor
Honestly I don't know what the default values for PyAV are. I assume the decision is made somewhere deep within FFMpeg guts.
For VPF there are no default values. That's done on purpose. Inaccurate color space conversion can impose penalty on inference accuracy. There were couple issues of this nature, you can find them in the list of closed issues if you like.
you mentioned there are two default options for both. What are these, how can I see them?
If you want to see possible values for color space and color range, here they are: https://github.com/NVIDIA/VideoProcessingFramework/blob/d8d5d1874c65ecfe6a82db2c282182e1b865452e/src/PyNvCodec/src/PyNvCodec.cpp#L240-L250
2 most common SDR color spaces are supported: BT.601 and BT.709. Those define the coefficients of YUV > RGB color space conversion.
Also, 2 most common SDR color ranges are supported: narrow (MPEG) where pixel range is within [16;235] and wide (JPEG) which means [0;255] pixel range.
Hi @DanCorvesor Please LMK if your issue is resolved
Hi @RomanArzumanyan . Yes thanks for explaining, appreciate your help and support.
Describe the bug There seems no obvious way (or example of how to) match the decoded frames with their timestamps specifically when it's required to flush the decoding queue.
Specifically, in the SampleDecode (with Standalone Demuxer) and SampleDemux, the packets are consumed sequentially and the example shows how to also get the timestamps (which is what I want). How can I match these timestamps with the frames, in particular if some of the frames are added to the queue and are not returned immediately so they have to be flushed at the end.
It would be great if I could have some advice/guidance about how to go about this or whether this is not possible. When I print the timestamps in the above examples, they are not in order. I'm not a GPU expert but it makes intuitive sense this is because of the parallelism under the hood but again the way in which you can relate this information to the processed frames would be very helpful.
In addition, it would be great to know if this were possible for the Pytorch examples, as you're looping through the frames to be able to have a way to get the correct timestamps and hence only select those that match an input fps.
Thanks in advance,
Daniel