cuiziteng / Illumination-Adaptive-Transformer

[BMVC 2022] You Only Need 90K Parameters to Adapt Light: A Light Weight Transformer for Image Enhancement and Exposure Correction. SOTA for low light enhancement, 0.004 seconds try this for pre-processing.
Apache License 2.0
441 stars 43 forks source link

单张图推论时间~0.004如何被测量的? #44

Closed PikachuRX78 closed 1 year ago

PikachuRX78 commented 1 year ago

你好,最近在复现此模型 以单张3090进行推论 使用img_demo.py进行推论计时 加载best_Epoch_lol_v1.pth的weight

使用以下代码推论600*400的单张图速度量测出来为100毫秒左右,GFLOPs为5.27941248

from fvcore.nn import FlopCountAnalysis, parameter_count_table
import time

...
start_time = time.time_ns()
_, _ ,enhanced_img = model(input)
end_time = time.time_ns()
total_time = end_time - start_time
print("Time taken by network is : %f ms"%(total_time/1000000))
..
flops = FlopCountAnalysis(model, input)
print("GFLOPs: ", flops.total()/1e9)
..

想请问论文里面0.004秒的推论速度是及1.44GFLOPs如何计算及实现 感谢作者

cuiziteng commented 1 year ago

您实验一下多张图像,然后算一下平均,因为GPU的固有问题,一开始第一张图算的时间会多,我当时是测试的LOL数据集15张测试图像,这个0.004是我当时算的15张测试图像的时间,然后算的平均,您可以自己先跑一下LOL-V1的测试集验证一下。

至于Flops,当时测算的是256x256图像尺寸的Flops,并非400x600,与CVPR 2022中MAXIM论文中的计算方法相同,如果按照400*600的尺寸计算,确实是您说的5.27Flops,这一点因为我们的失误没有在论文讲清楚,非常抱歉,如下图: 3561683818449_ pic

还有任何问题都欢迎提问,非常感谢指正~

cuiziteng commented 1 year ago
image

已在readme.md中说明。

PikachuRX78 commented 1 year ago

感谢作者的回答 刚刚试了一下 使用evaluation_lol_v1.py 15张大概使用143ms左右 一张大约是9.53ms,确实是缩短了很多,不过似乎还算有点差距 以下为我测试的代码

total_time=0
with torch.no_grad():
    for i, imgs in tqdm(enumerate(val_loader)):
        #print(i)
        low_img, high_img, name = imgs[0].cuda(), imgs[1].cuda(), str(imgs[2][0])
        # print(name)
        #print(low_img.shape)
        start_time = time.time_ns()
        mul, add ,enhanced_img = model(low_img)
        end_time = time.time_ns()
        temp = end_time - start_time
        total_time += temp
        if config.save:
            torchvision.utils.save_image(enhanced_img, result_path + str(name) + '.png')

        ssim_value = ssim(enhanced_img, high_img, as_loss=False).item()
        psnr_value = psnr(enhanced_img, high_img).item()

        ssim_list.append(ssim_value)
        psnr_list.append(psnr_value)

print("Average time taken by network is : %f ms"%(total_time/1000000/15))

输出结果

Total examples: 15
15it [00:00, 45.14it/s]
Average time taken by network is : 9.364251 ms
The SSIM Value is: 0.8089913686116537
The PSNR Value is: 23.382731374104818
cuiziteng commented 1 year ago

您再算一下LOL-V2数据集的inference速度试试看,那个有100张图,应该是更合理些,15张图可能前面的也会受机器影响inference慢而影响后面的,0.004当时是我在LOL-V2算出来的。

如果您的这台电脑上有其他人目前也在跑代码,这也会影响到inference的速度。

我手头没有空闲的3090,后续有的话会上传一下截图,anyway,非常感谢关注~

PikachuRX78 commented 1 year ago

感谢作者,看来的确是GPU启动的问题

使用evaluation_lol_v2.py 平均一张2.74ms 测试代码

total_time=0
start_time = time.time_ns()
with torch.no_grad():
    for i, imgs in tqdm(enumerate(val_loader)):
        low_img, high_img = imgs[0].cuda(), imgs[1].cuda()
        start_time = time.time_ns()
        mul, add ,enhanced_img = model(low_img)
        end_time = time.time_ns()
        temp = end_time - start_time
        total_time += temp

        ssim_value = ssim(enhanced_img, high_img, as_loss=False).item()
        psnr_value = psnr(enhanced_img, high_img).item()

        ssim_list.append(ssim_value)
        psnr_list.append(psnr_value)
print("Average time taken by network is : %f ms"%(total_time/1000000/100))

输出结果

Total examples: 100
100it [00:01, 71.37it/s]
Average time taken by network is : 2.745986 ms
The SSIM Value is: 0.8237025141716003
The PSNR Value is: 23.499295816421508
cuiziteng commented 1 year ago

OK~ 看来这个比4ms还要快