cvlab-kaist / GaussianTalker

Official implementation of “GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting” by Kyusun Cho, Joungbin Lee, Heeji Yoon, Yeobin Hong, Jaehoon Ko, Sangjun Ahn and Seungryong Kim
Other
300 stars 36 forks source link

Why is the maximum duration of the rendered video only 29 seconds and always 702 frames #22

Closed KKProject closed 5 months ago

KKProject commented 5 months ago

Looking for config file in model/cfg_args Config file found: model/cfg_args Rendering model Namespace(add_point=False, add_points=False, apply_rotation=False, batch=1, batch_size=32, bounds=1.6, canonical_tri_plane_factor_list=['opacity', 'shs'], checkpoint_iterations=[], coarse_iterations=7999, compute_cov3D_python=False, configs='arguments/64_dim_1_transformer.py', convert_SHs_python=False, custom_aud='/User/GaussianTalker/custom_audio/1/audio.npy', custom_sampler=None, custom_wav='/User/GaussianTalker/custom_audio/1/audio.wav', d_model=64, data_device='cuda', dataloader=True, debug=True, debug_from=-1, defor_depth=2, deformation_lr_delay_mult=0.01, deformation_lr_final=1e-05, deformation_lr_init=0.0001, densification_interval=100, densify_from_iter=1000, densify_grad_threshold_after=0.0002, densify_grad_threshold_coarse=0.001, densify_grad_threshold_fine_init=0.0002, densify_until_iter=7000, depth_fine_tuning=True, detect_anomaly=False, drop_prob=0.2, empty_voxel=False, eval=True, expname='', extension='.png', feature_lr=0.0025, ffn_hidden=128, grid_lr_final=0.00016, grid_lr_init=0.0016, grid_pe=0, images='images', ip='127.0.0.1', iteration=10000, iterations=10000, kplanes_config={'grid_dimensions': 2, 'input_coordinate_dim': 3, 'output_coordinate_dim': 32, 'resolution': [64, 64, 64]}, l1_time_planes=0.0001, lambda_dssim=0, lambda_lpips=0, lip_fine_tuning=True, llffhold=8, model_path='model', multires=[1, 2], n_head=2, n_layer=1, net_width=128, no_do=False, no_dr=False, no_ds=False, no_dshs=False, no_dx=False, no_grid=False, only_infer=True, opacity_lr=0.05, opacity_pe=2, opacity_reset_interval=3000, opacity_threshold_coarse=0.005, opacity_threshold_fine_after=0.005, opacity_threshold_fine_init=0.005, percent_dense=0.01, plane_tv_weight=0.0002, port=6009, pos_emb=True, posebase_pe=10, position_lr_delay_mult=0.01, position_lr_final=1.6e-06, position_lr_init=0.00016, position_lr_max_steps=20000, pruning_from_iter=500, pruning_interval=100, quiet=False, render_process=False, resolution=-1, rotation_lr=0.001, save_iterations=[1000, 3000, 4000, 5000, 6000, 7000, 9000, 10000, 12000, 14000, 20000, 30000, 45000, 60000, 30000], scale_rotation_pe=2, scaling_lr=0.005, sh_degree=3, skip_test=True, skip_train=True, skip_video=False, source_path='data_set/wangxueyin', split_gs_in_fine_stage=False, start_checkpoint=None, static_mlp=False, test_iterations=[0, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 5500, 6000, 6500, 7000, 7500, 8000, 8500, 9000, 9500, 10000, 10500, 11000, 11500, 12000, 12500, 13000, 13500, 14000, 14500, 15000, 15500, 16000, 16500, 17000, 17500, 18000, 18500, 19000, 19500, 20000, 20500, 21000, 21500, 22000, 22500, 23000, 23500, 24000, 24500, 25000, 25500, 26000, 26500, 27000, 27500, 28000, 28500, 29000, 29500, 30000, 30500, 31000, 31500, 32000, 32500, 33000, 33500, 34000, 34500, 35000, 35500, 36000, 36500, 37000, 37500, 38000, 38500, 39000, 39500, 40000, 40500, 41000, 41500, 42000, 42500, 43000, 43500, 44000, 44500, 45000, 45500, 46000, 46500, 47000, 47500, 48000, 48500, 49000, 49500], time_smoothness_weight=0.001, timebase_pe=4, timenet_output=32, timenet_width=64, train_l=['xyz', 'deformation', 'grid', 'f_dc', 'f_rest', 'opacity', 'scaling', 'rotation'], train_tri_plane=True, use_wandb=False, visualize_attention=False, weight_constraint_after=0.2, weight_constraint_init=1, weight_decay_iteration=5000, white_background=True, zerostamp_init=False) [05/06 00:37:09] feature_dim: 64 [05/06 00:37:09] Loading trained model at iteration 10000 [05/06 00:37:09] [INFO] load aud_features: torch.Size([7713, 29, 16]) [05/06 00:37:09] Reading Training Transforms [05/06 00:37:10] Reading Test Transforms [05/06 00:37:17] Generating Video Transforms [05/06 00:37:20] Reading Custom Transforms [05/06 00:37:20] Loading Training Cameras [05/06 00:37:26] Loading Test Cameras [05/06 00:37:26] Loading Video Cameras [05/06 00:37:26] Loading Custom Cameras [05/06 00:37:26] Deformation Net Set aabb [0.75849324 0.90093476 0.45244923] [-0.78521377 -0.8166129 -0.566415 ] [05/06 00:37:26] Voxel Plane: set aabb= Parameter containing: tensor([[ 0.7585, 0.9009, 0.4524], [-0.7852, -0.8166, -0.5664]]) [05/06 00:37:26] loading model from existsmodel/point_cloud/iteration_10000 [05/06 00:37:28] ============== <scene.Scene object at 0x7800e0d15d50> [05/06 00:37:28] ------------------------------------------------- [05/06 00:37:28] test set rendering : 702 frames [05/06 00:37:28] ------------------------------------------------- [05/06 00:37:28] point nums: 43459 [05/06 00:37:28] Rendering progress: 100%|██████████| 702/702 [00:34<00:00, 20.28it/s] total frame: 702 [05/06 00:38:03] FPS: 23.27712135731388 [05/06 00:38:03]

iterations done: begin wirte [05/06 00:39:17]

joungbinlee commented 5 months ago

Thank you for using our model!

We use 10/11 of the entire dataset as training data, and the remaining 1/11 is used for testing, which is why it seems that the same time period is always covered.

Additionally, if you increase the batch size, the number of frames in the last batch might be smaller than the batch size, which could prevent rendering the last batch. Therefore, if you wish to render the entire test dataset, please set the batch size to 1. This adjustment will ensure that you can successfully render each frame in the test set.

Thank you.