aiporre / multidataloader

Dataloader for Tensor Flow using the multithreading features
Apache License 2.0
8 stars 0 forks source link

Not possible to parallize videos dataset in Windows #11

Open aiporre opened 3 years ago

aiporre commented 3 years ago

Error at vread(), happens when parallize reading vidios in Windows.

  File "C:\Users\asdf\Anaconda3\envs\bc\lib\site-packages\dpipe\utils.py", line 162, in read_video
    return vread(path)

  File "C:\Users\asdf\Anaconda3\envs\bc\lib\site-packages\skvideo\io\io.py", line 144, in vread
    reader = FFmpegReader(fname, inputdict=inputdict, outputdict=outputdict, verbosity=verbosity)

  File "C:\Users\asdf\Anaconda3\envs\bc\lib\site-packages\skvideo\io\ffmpeg.py", line 44, in __init__
    super(FFmpegReader,self).__init__(*args, **kwargs)

  File "C:\Users\asdf\Anaconda3\envs\bc\lib\site-packages\skvideo\io\abstract.py", line 116, in __init__
    "No way to determine width or height from video. Need `-s` in `inputdict`. Consult documentation on I/O.")
aiporre commented 3 years ago

Possible solution/workaround could be to use arguments size (height, width) from the x_shape input in the function make_dataset(). One attempt fprobe should be also a way to get this inputs without the user need to give that.

aiporre commented 3 years ago

?? Is this a problem related with the paralization of ffmpeg. Maybe we want to another library to read video files. We may need to make a benchmarking between the well known libraries. scikit-video is a library that integrate lots analysis pipelines, that are not usesful (at the current stage of this library so maybe we will switch.) For now the video dataset parallelization with TF is not possible,

aiporre commented 3 years ago

possible solution change in utils:

    def read_video(path):
        if isinstance(path, bytes):
            path = path.decode()
        return vread(path)

in utils.py

aiporre commented 3 years ago

A problem occurs when videos have undefined len, and the generator output None,heigh,width,channels, then from_generator first input is not suffices, and the validation in tf2.3 don't match witht the shape_spec. the following will solve the problem:

        def gen(x):
            yield self.gen_object.read_fcn(x)
        self.dataset = items_dataset.interleave(
                lambda x: tf.data.Dataset.from_generator(gen,
                                                     types,
                                                     shapes,
                                                     args=(x,)),
                num_parallel_calls=num_parallel_calls,
                cycle_length=cycle_length,
                block_length=block_length)