Thx for your great repo about dynamic rendering. I tried to train 'cut_roasted_beef' scene with your code but got the following error:
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)
The full training log is:
Using /dellnas/home/4dg/.cache/torch_extensions/py37_cu116 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /dellnas/home/4dg/.cache/torch_extensions/py37_cu116/diff_gaussian_rasterization/build.ninja...
Building extension module diff_gaussian_rasterization...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module diff_gaussian_rasterization...
Optimizing output/N3V/cut_roasted_beef
Output folder: output/N3V/cut_roasted_beef [21/02 15:49:42]
Tensorboard not available: not logging progress [21/02 15:49:42]
Found transforms_train.json file, assuming Blender data set! [21/02 15:49:42]
Reading Training Transforms [21/02 15:49:42]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5700/5700 [00:01<00:00, 3104.34it/s]
Reading Test Transforms [21/02 15:49:44]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 300/300 [00:00<00:00, 2471.07it/s]
Loading Training Cameras [21/02 15:49:44]
Loading Test Cameras [21/02 15:49:45]
Number of points at initialisation : 300000 [21/02 15:49:46]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:01<00:00, 3.85it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:01<00:00, 3.98it/s]
[ITER 500] Evaluating train: L1 0.027832254767417908 PSNR 27.3373046875 [21/02 15:53:44]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 300/300 [01:07<00:00, 4.43it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 300/300 [01:07<00:00, 4.75it/s]
[ITER 500] Evaluating test: L1 0.02017407544578115 PSNR 29.758896795908612 [21/02 15:54:52]
[ITER 500] Saving best checkpoint [21/02 15:54:52]
Training progress: 2%|█ | 600/30000 [05:27<3:50:45, 2.12it/s, Loss=0.0107820, PSNR=26.70, Ll1=0.0282, Lssim=0.1027]Traceback (most recent call last):
File "train.py", line 403, in
args.gaussian_dim, args.time_duration, args.num_pts, args.num_pts_ratio, args.rot_4d, args.force_sh_3d, args.batch_size)
File "train.py", line 240, in training
gaussians.densify_and_prune(opt.densify_grad_threshold, opt.thresh_opa_prune, scene.cameras_extent, size_threshold, opt.densify_grad_t_threshold)
File "/dellnas/home/4dg/project/4d-gaussian-splatting/scene/gaussian_model.py", line 563, in densify_and_prune
self.densify_and_split(grads, max_grad, extent, grads_t, max_grad_t)
File "/dellnas/home/4dg/project/4d-gaussian-splatting/scene/gaussian_model.py", line 516, in densify_and_split
rots = build_rotation_4d(self._rotation[selected_pts_mask], self._rotation_r[selected_pts_mask]).repeat(N,1,1)
File "/dellnas/home/4dg/project/4d-gaussian-splatting/utils/general_utils.py", line 131, in build_rotation_4d
A = M_l @ M_r
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)
Training progress: 2%|█ | 600/30000 [05:29<4:29:03, 1.82it/s, Loss=0.0107820, PSNR=26.70, Ll1=0.0282, Lssim=0.1027]
It seems like the first epoch runs well but there exists something wrong in the second epoch. Cloud you pls help solve this problem?
@fishfishson Hi, may I ask you how you solve your problem? And what is the problem of your server?
Because I also encountered the same error after first epoch
Hi author,
Thx for your great repo about dynamic rendering. I tried to train 'cut_roasted_beef' scene with your code but got the following error:
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)
The full training log is:
It seems like the first epoch runs well but there exists something wrong in the second epoch. Cloud you pls help solve this problem?