Runtime performance - Githubissues

amanchadha / iSeeBetter

https://arxiv.org/abs/2006.11161

MIT License

359 stars 68 forks source link

Runtime performance #13

Closed pepinu closed 4 years ago

pepinu commented 4 years ago

Hello,

In the paper, I haven't seen any performance regarding Runtime/FPS for different sizes.

Do you have any ballpark numbers, how fast is the network for example on 360p and 720p videos? Would it differ by a large margin if x4 upscale was changed to x2?

Best regards

rishftw commented 4 years ago

In my preliminary testing, when running iSeeBetterTest.py using parameters

Namespace(chop_forward=False, data_dir='./Vid4', debug=False, file_list='ggbb_png.txt', future_frame=True, gpu_mode=True, gpus=1, model='weights/netG_epoch_4_1.pth', model_type='RBPN', nFrames=7, other_dataset=True, output='Results/', residual=False, seed=123, testBatchSize=1, threads=8, upscale_factor=4)

on an NVIDIA V100 machine, a grayscale png sequence of 100 frames where each frame measures 96x72(Note: 384x288 original input, but iSeeBetter downscales it by the scale factor for future re-upscaling to original size and testing I think, not 100% sure) and uses around ~20 kB/frame (80kB/frame original, see previous note), takes a (ballpark) 0.2-0.3s per frame(see full output attached). This would extrapolate to around 6-9s of processing time for one second of 96x72 30fps ~0.6 MBps "video". Those are the numbers I have as of now

output.txt

pepinu commented 4 years ago

Hey, thanks for such a rapid response, very insightful

the provided parameters show testBatchSize=1, could this be increased to speed the network? What is the memory usage during the inference?

rishftw commented 4 years ago

Yeah no worries, I was messing around with it anyway.

Yo lol I didn't even notice that but hell yeah that speeds things up. In the attached file, I've listed the output times per batch(so divide the times by the testBatchSize I guess) and normal RAM usage at idle and during the process. Unfortunately I'm using Colab and I can't goddamn figure out how to check vram usage during execution of the loop, which is the what you actually wanted to see I'm guessing. What I can tell you is that the GPU was a V100 with ~16gigs of memory and it fails(CUDA out of memory. somewhere between a testBatchSize of 20 and 24, so I guess it uses around 16gigs somewhere around that batch size. But yeah, seems like it came down to around 0.05s/frame by testBatchSize=20.

testBatchOut.txt

pepinu commented 4 years ago

alright, thanks, this was a great help