Increasing the smoothness of the video progress bar

H-Dempsey commented 1 year ago

Hi napari-deeplabcut!

I am a big fan of both napari and deeplabcut, and I was so excited when I first found out about this collaboration. This is my first pull request and I apologise if I make some mistakes.

I noticed that when I scroll through a video during manual frame extraction, the interface lags.

https://github.com/DeepLabCut/napari-deeplabcut/assets/101311642/117bcceb-b611-44bd-8e79-bd7fab3b3b95

If I replace the current video reader class + dask lazy loads with the napari-video reader class, the scrolling is much smoother.

https://github.com/DeepLabCut/napari-deeplabcut/assets/101311642/cf3864fb-2027-44bb-b40c-e97a24d40882

For the non-opencv option, I tried to make my own version of the napari-video class and replace all cv2 functions with imageio functions. But unfortunately, the speed looked similar, so I decided to keep the PyAV and dask lazy loading for that. I have included it below for interest.

Thank you for making this really useful tool.

Harry

import os
import numpy as np
import imageio

class VideoReaderNPIO:
    def __init__(self, filename: str, remove_leading_singleton: bool = True):
        """Open video in filename."""
        if not os.path.exists(filename):
            raise FileNotFoundError(f'{filename} not found.')
        self._filename = filename
        self._vr = imageio.get_reader(filename)
        self._seek(0)  # reset to first frame
        frame = self._vr.get_next_data()  # read frame to get number of channels
        self.frame_channels = int(frame.shape[-1])
        self.remove_leading_singleton = remove_leading_singleton
        self.current_frame_pos = 0

    def __del__(self):
        try:
            self._vr.close()
        except AttributeError:  # if file does not exist this will be raised since _vr does not exist
            pass

    def __len__(self):
        """Length is number of frames."""
        return self.number_of_frames

    def __getitem__(self, index):
        # numpy-like slice imaging into arbitrary dims of the video
        # ugly.hacky but works
        frames = None
        if isinstance(index, int):  # single frame
            # ret, frames = self.read(index)
            # frames = cv2.cvtColor(frames, cv2.COLOR_BGR2RGB)
            self._seek(index)
            frames = self._vr.get_next_data()  # read
            self.current_frame_pos = index
        elif isinstance(index, slice):  # slice of frames
            frames = np.stack([self[ii] for ii in range(*index.indices(len(self)))])
        elif isinstance(index, range):  # range of frames
            frames = np.stack([self[ii] for ii in index])
        elif isinstance(index, tuple):  # unpack tuple of indices
            if isinstance(index[0], slice):
                indices = range(*index[0].indices(len(self)))
            elif isinstance(index[0], (np.integer, int)):
                indices = int(index[0])
            else:
                indices = None

            if indices is not None:
                frames = self[indices]

                # index into pixels and channels
                for cnt, idx in enumerate(index[1:]):
                    if isinstance(idx, slice):
                        ix = range(*idx.indices(self.shape[cnt+1]))
                    elif isinstance(idx, int):
                        ix = range(idx-1, idx)
                    else:
                        continue

                    if frames.ndim==4: # ugly indexing from the back (-1,-2 etc)
                        cnt = cnt+1
                    frames = np.take(frames, ix, axis=cnt)

        if self.remove_leading_singleton and frames is not None:
            if frames.shape[0] == 1:
                frames = frames[0]
        return frames

    def __repr__(self):
        return f"{self._filename} with {len(self)} frames of size {self.frame_shape} at {self.frame_rate:1.2f} fps"

    def __iter__(self):
        return self[:]

    def __enter__(self):
        return self

    def __exit__(self):
        """Release video file."""
        del(self)

    def read(self, frame_number=None):
        """Read next frame or frame specified by `frame_number`."""
        is_current_frame = frame_number == self.current_frame_pos
        # no need to seek if we are at the right position - greatly speeds up reading sunbsequent frames
        if frame_number is not None and not is_current_frame:
            self._seek(frame_number)
        frame = self._vr.get_next_data()  # read
        return frame

    def close(self):
        self._vr.close()

    def _reset(self):
        """Re-initialize object."""
        self.__init__(self._filename)

    def _seek(self, frame_number):
        """Go to frame."""
        self._vr.set_image_index(frame_number)
        self.current_frame_pos = frame_number

    @property
    def number_of_frames(self):
        return int(self._vr.count_frames())

    @property
    def frame_rate(self):
        return self._vr.get_meta_data()['fps']

    @property
    def frame_height(self):
        return int(self._vr.get_meta_data()['size'][1])

    @property
    def frame_width(self):
        return int(self._vr.get_meta_data()['size'][0])

    # @property # I didn't know how to implement this in imageio.
    # def fourcc(self):
    #     return int(self._vr.get(cv2.CAP_PROP_FOURCC))

    # @property # I didn't know how to implement this either.
    # def frame_format(self):
    #     return int(self._vr.get(cv2.CAP_PROP_FORMAT))

    @property
    def frame_shape(self):
        return (self.frame_height, self.frame_width, self.frame_channels)

    @property
    def dtype(self):
        return np.uint8

    @property
    def shape(self):
        return (self.number_of_frames, *self.frame_shape)

    @property
    def ndim(self):
        return len(self.shape)+1

    @property
    def size(self):
        return np.product(self.shape)

    def min(self):
        return 0

    def max(self):
        return 255

jeylau commented 1 year ago

Hi @H-Dempsey, thank you so much for contributing! I just today had to deal with a video that was very slow to read, so I was excited to try your PR. Unfortunately, on that 100MB video the difference is barely visible, and after reading the code behind napari-video I realized that the implementation is pretty similar to what we had tried earlier on here. It was faster, but cv2.VideoCapture().read() is not index-safe..., so on long videos, we'd often see a mismatch between the video frame and the corresponding annotations 😕 Is that something you observed too?

H-Dempsey commented 1 year ago

Hi @jeylau,

Thank you for your response!

Unfortunately, on that 100MB video the difference is barely visible

I can show that it works on my end with larger videos. The previous video I used was 25 MB, and here is another test with a 6.7 GB video (1920x1080, 49 mins, 30 fps).

Before:

https://github.com/DeepLabCut/napari-deeplabcut/assets/101311642/df0491eb-4c33-4ec1-bc10-29ef0d8b628c

After:

https://github.com/DeepLabCut/napari-deeplabcut/assets/101311642/db045bd1-7c85-4718-acd8-c302eed29dc6

cv2.VideoCapture().read() is not index-safe..., so on long videos, we'd often see a mismatch between the video frame and the corresponding annotations 😕 Is that something you observed too?

I just played around with an analysed 12-hour (20 fps) video, and I haven't observed it so far. Could you explain why cap.read() is not index-safe, and could lead to a mismatch between the frame and annotations in long videos? Could that be due to cap.set(cv2.CAP_PROP_POS_FRAMES, frame_no) rather than cap.read()? I ran into this thread. Setting the frame number using this seems to be unreliable sometimes (e.g. when the frame rate is variable).

While my pull request seems to improve speed, if it leads to inaccurate frame setting, I think we should not merge it.

Harry

H-Dempsey commented 10 months ago

Hi again,

I spent some more time looking into the issue of CAP_PROP_POS_FRAMES being inaccurate sometimes. The people from this other project ran into the same issue as us.

They solved it by using a video player called decord. It splits up the video seeking into fast and accurate versions. By default, the video player uses accurate seek and it is also faster than the OpenCV and PyAV versions.

I was keen to see whether a decord video player with napari would give fast and accurate seeking. I adapted the __getitem__ function from the [napari-video](https://github.com/janclemenslab/napari-video) reader for decord, and this is the result. https://github.com/DeepLabCut/napari-deeplabcut/assets/101311642/6caaa0ed-77e1-4528-8f80-df1564342a57 It is significantly faster than the "Before" video above, and it uses accurate seeking. On video sizes that are less than 6.7 GB, 1920x1080, 49 mins, 30 fps, it is even better. I have added the changes to my pull request. What do you think about this? Harry

DeepLabCut / napari-deeplabcut

Increasing the smoothness of the video progress bar #89