ITMLUT

Official PyTorch implemeation of "Redistributing the Precision and Content in 3D-LUT-based Inverse Tone-mapping for HDR/WCG Display"(paper(arViv), paper) in CVMP2023 (website, proceedings).

1. A quick glance to all AI-3D-LUT algorithms

Here are all AI-3D-LUT (look-up table) as far as we know (last updated 07/03/2024), please jump to them if interested.

You can cite our paper if you feel this overview helpful.

@InProceedings{Guo_2023_CVMP,
    author    = {Guo, Cheng and Fan, Leidong and Zhang, Qian and Liu, Hanyuan and Liu, Kanglin and Jiang, Xiuhua},
    title     = {Redistributing the Precision and Content in 3D-LUT-based Inverse Tone-mapping for HDR/WCG Display},
    booktitle = {Proceedings of the 20th ACM SIGGRAPH European Conference on Visual Media Production (CVMP)},
    month     = {November},
    year      = {2023},
    pages     = {1-10},
    doi       = {10.1145/3626495.3626503}
}

AI-3D-LUT algotithms							Expressiveness of the trained LUT			Output of neural network(s)	Nodes (packing)
Idea	Task	Name	Publication	Paper	Code	Institution	#BasicLUT	LUT size each	(#) Extra dimension	Output of neural network(s)	Nodes (packing)
First AI-LUT	Image enhancement /retouching	A3DLUT	20-TPAMI	paper	code	HK_PolyU & DJI Innovation	3×1	3×33³	-	weights (of basic LUTs)	uniform
C		SA-LUT-Nets	ICCV'21	paper	-	Huawei Noah's Ark Lab	3×10	3×33³	(10) category	weights & category map
E		CLUT-Net	MM'22	paper	code	CN_TongjiU & OPPO Research	20×1	3×5×20 (compressed LUT representation)	-	weights
E		F2D-LUT	MM'22	paper	code	CN_TsinghuaU	6×3	2×33² (3D LUT decoupled to 2D LUTs)	(3) R-G/R-B/G-B channel order	weights
N		AdaInt	CVPR'22	paper	code	CN_SJTU & Alibaba Group	3×1	3×33³	-	weights & nodes	learned non-uniform
N		SepLUT	ECCV'22	paper	code	CN_SJTU & Alibaba Group	1 (no self-adaptibility)	3×9³ or 3×17³		directly 1D & 3D LUTs	learned non-linear by 1D LUT
C		DualBLN	ACCV'22	paper	code	CN_NorthwesternPolyU	5×1	3×36³		LUT fusion map	uniform
C		4D-LUT	23-TIP	paper	-	CN_XianJiaotongU & Microsoft Research Asia	3×1	3×33⁴	(33) context	weights & context map
C & E		AttentionLUT	24-ArXiv	paper	-	CN_SJTU John Hopcroft Center	no (donot relay on basic LUT for self-adaptibility)	9×15×33 (represented by Canonical Polyadic decomposition)	-	feature (to encode Q,K,V tensors)
E	Photorealistic Style Transfer	NLUT	23-arXiv	paper	code	Sobey Digital Technology & Peng Cheng Lab	2048×1	3×32×32 (compressed LUT representation)	-	weights
C	Video Low-light enhancement	IA-LUT	MM'23	paper	code	CN_SJTU & Alibaba Damo Academy	3×1	3×33⁴	(33) intensity	weights & intensity map
No	Underwater Imge Enhancement	INAM-LUT	23-Sensors	paper	-	CN_XidianU	3×1	3×33(?)³	-	weights
C	Tone-mapping	LapLUT	NeurIPS'23	paper	-	CN_HUST & DJI Innovation	3×1	3×33³	-	weight map (of each interpolated image)
Ours	HDR/WCG Inverse Tone-mapping	ITM-LUT	CVMP'23	paper	see below	CN_CUC & Peng Cheng Lab	5×3	3×17³	(3) luminance probability (contribution)	weights	explicitly defined non-uniform

In col. idea:

C stands for improving the expressiveness of LUT content (by new way to generate image-adaptive LUT or introducing new dimension);

E stands for making LUT further efficient (by special representation of LUT's elements);

N stands for setting non-uniform nodes (to optimize LUT's interpolation error on image with specific numerical distribution).

Note that:

We only listed AI-3D-LUTs for image-to-image low-level vision tasks, and below AI-LUTs are not included:

Non-3D AI-LUTs for other CV tasks: e.g. SR-LUT, MuLUT(paper1, paper2(extented to image restoration)), VA-LUT, SPLUT (super-resolution, non-3D-LUT), MEFLUT (multi-exposure fusion, 1D-LUT), SA-LuT-Nets (medical imaging) etc. (Such LUTs may not even involve an interpolation process).
Claim to be AI-LUT, but use other mechanism to conduct image-to-image transform: e.g. NILUT (represent LUT transform using MLP(multi-layer perceptron)) etc.

2. Our algorithm ITM-LUT

Our AI-3D-LUT alogorithm named ITM-LUT conduct inverse tone-mapping (ITM) from standard dynamic range (SDR) image/frame to its high dynamic range and wide color gamut (HDR/WCG) version.

2.1 Key features

Self-adaptability: LUT content will alter with input SDR's statistics, by merging basic LUTs using neural-network-generated weight from input SDR.
AI-learning: Rather a 'top-down design' static LUT, our LUT can be learned from any dataset in 'bottom-up' manner, enabling the reverse engineering of any technical and artistic intent between SDR and HDR/WCG.
HDR/WCG optimization: For a LUT processing higher-bit-depth HDR/WCG content (requiring larger LUT size N), we use 3 LUTs with different non-uniform nodes. Their result will have less interpolation error respectively in different ranges, so we use a pixel-wise contribution map to blend their best ranges. In this case, 3 smaller LUTs (e.g. N=17) can reach the same error level to single bigger LUT (e.g. N=33), while occupy less #elements (e.g. 44217<107811).

2.2 Prerequisites

Python
PyTorch
OpenCV
ImageIO
NumPy
GCC/G++

2.3 Usage (how to test)

First, install the CUDA&C++ implementation of trilinear interpolation with non-uniform vertices (need GCC/G++):

python3 ./ailut/setup.py install

after that, you can get ailut package in your python.

Run test.py with below configuration(s):

python3 test.py frameName.jpg

When batch processing, use wildcard *:

python3 test.py framesPath/*.png

or like:

python3 test.py framesPath/footageName_*.png

Add below configuration(s) for specific propose:

Propose	Configuration
Specifing output path	`-out resultDir/` (default is inputDir)
Resizing image before inference	`-resize True -height newH -width newW`
Adding filename tag	`-tag yourTag`
Forcing CPU processing	`-use_gpu False`
Using input SDR with bit depth != 8	e.g. `-in_bitdepth 16`
Saving result HDR in other format (defalut is uncompressed 16-bit `.tif`of single frame)	`-out_format suffix` `png` as 16bit .png `exr` require extra package `openEXR`

Change line 104 in test.py to use other parameters/checkpoint:

Current params.pth is trained on our own HDRTV4K dataset and DaVinci degradation model (available here), it can score 35.14dB the PSNR, 0.9605 the SSIM, 14.330 the $\Delta$ E_itp and 9.1181 VDP3 ('task'='side-by-side', 'color_encoding'='rgb-bt.2020', 'pixel_per_degree'=60 on 1920*1080 image) on HDRTV4K-DaVinci testset.
Checkpoint params_TV1K.pth is trained on popular HDRTV1K dataset and YouTube degradation model, it can score 36.69dB the PSNR, 0.9811 the SSIM, 10.194 the $\Delta$ E_itp and 8.9122 VDP3 ('task'='side-by-side', 'color_encoding'='rgb-bt.2020', 'pixel_per_degree'=60 on 1920*1080 image) on HDRTV1K testset.
We will later release more interesting checkpoint(s).

2.4 Training code

First, download the training code from BaiduNetDisk(code:qgs2) or GoogleDrive. This package contain 5 essential real ITM LUTs used in our own LUT initialization, and other 13 real ITM LUTs (in both N=17/33/65) where you can use any of their combinations to try new LUT initialization.

Then:

cd ITMLUT_train/codes

python3 train.py -opt options/test/test_Net.yml

You can modify training configuration e.g. #basicLUTs and LUTsize at codes/options/test/test_Net.yml.
Rename any aother LUT in codes/real_luts/other_luts to e.g. 2_17.cube in codes/real_luts to try new initialization, remenber to delete the first row (str) when using other commercial LUT(s).

2.5 Changelog

Date	log
29 Feb 2024	Since most SoTAs are still trained and tested on *HDRTV1K* dataset, we add a checkpoint `params_TV1K.pth` trained on it, so result will get a similar look as SoTAs.
3 Mar 2024	Training code (along with 18 real ITM LUTs in N=17/33/65) is now released.

Contact

Guo Cheng (Andre Guo) guocheng@cuc.edu.cn

State Key Laboratory of Media Convergence and Communication (MCC), Communication University of China (CUC), Beijing, China.
Peng Cheng Laboratory (PCL), Shenzhen, China.

AndreGuo / ITMLUT

readme