Open johndpope opened 3 weeks ago
Hi @johndpope
From your description I got some overall understanding of what you are trying to achieve. What VALI features are you interested in WRT your project ?
I’m looking at this framework from performance perspective
It does natively support:
i had a play with multithreading - on cpu - but to get images done - looks like can't get away from cpu. https://github.com/johndpope/MegaPortrait-hack/issues/38
i think the problem is pilimage has never had gpu support....
i use the streamreader from torchvision - somewhat successful to cycle through frames.
streamer = StreamReader(src=video_path)
# - ``"rgb24"``: 8 bits * 3 channels (R, G, B)
# - ``"bgr24"``: 8 bits * 3 channels (B, G, R)
# - ``"yuv420p"``: 8 bits * 3 channels (Y, U, V)
# - ``"gray"``: 8 bits * 1 channels
streamer.add_basic_video_stream(
frames_per_chunk=16000,
frame_rate=25,
width=self.width,
height=self.height,
format="rgb24"
)
i appreciate your timely response - i guess i continue down this streamer path for now. but you're saying i can do gpu + images? maybe i can extend your work. off to karate training now. i take another look later.
@johndpope
If you just want to get decoded video frames and run your inference on them as a first step, you may follow the code from torch segmentation test:
It keeps everything on GPU. Yet there's a room for perf optimization which wasn't done because it's a unit test, not perf test. E. g. decode video frames in batches, not one by one. Anyway, it may be a good starting point.
@johndpope
Hi. Please LMK if your issue is resolved.
Hi @RomanArzumanyan - thanks - i circle back when i have more bandwidth. closing for now. this kinda wrapper thing - i had a crack at cupy - https://github.com/johndpope/MegaPortrait-hack/blob/feat/38-multicore/EmoDataset.py#L251 wasn't successful - but i take another look with this.
processing a video with gpu - and then augmenting etc - it's very valuable -
please consider adding some helper wrappers to allow working with the frames. I'll take a stab when i have bandwidth.
tensor_frame, image_frame = self.augmentation(frame, self.pixel_transform, state)
if self.apply_crop_warping:
transform = transforms.Compose([
transforms.Resize((self.width, self.height)),
transforms.ToTensor(),
I’m looking at this framework from performance perspective- I want to use it to quickly preprocess videos en masse.
I am happy to help build a wrapper - I would need it to match record to make it easier to consume. I really want to achieve this though
https://github.com/johndpope/MegaPortrait-hack/issues/38
To get best performance - I’m prepare to cut code in c++ to do this. Or is there other things that come to mind ?
Related - https://github.com/dmlc/decord/issues/283