Closed YanMinMacMaster closed 3 months ago
The problem seems to be at this line 4417it [01:09, 1.65s/it]scripts/train_xx.sh: line 8: 70450 Killed python train_mouth.py -s $dataset -m $workspace --audio_extractor $audio_extractor
. The training process has been killed.
I guess it's because your computer does not have enough memory to pre-load all the training data. If I remember correctly, training Macron may take 40GB or more memory for pre-loading data. You may make some changes to the code, in order to load the data into memory only when being used.
thx. i will have a try.
同样的问题,有解决的办法吗,2张3090,合计48g都跑不动
同样的问题,有解决的办法吗,2张3090,合计48g都跑不动
是内存的问题,跟显存没关系,减少内存需求需要把预加载到内存里的image和background转为调用时再从磁盘中读取
During the audio pre-processing, I used DeepSpeech. I found only one file ending with '.wav' in the data folder (I was using the Macron video). It is called aud.wav, and I preprocessed it. During the first training attempt, the terminal displayed "aud_ds.npy not found". So, I renamed aud.wav and aud.npy to aud_ds. Then it displayed errors as stated in the title. The output is like this:
(talking_gaussian) min@min-US-Desktop-Aegis-RS:~/Documents/TalkingGaussian$ bash scripts/train_xx.sh data/macron output/marcron 0 Optimizing output/marcron Output folder: output/marcron [06/08 23:43:47] Found transforms_train.json file, assuming Blender data set! [06/08 23:43:47] Reading Training Transforms [06/08 23:43:47] 7938it [00:01, 4091.06it/s] 4417it [01:09, 1.65s/it]scripts/train_xx.sh: line 8: 70450 Killed python train_mouth.py -s $dataset -m $workspace --audio_extractor $audio_extractor Optimizing output/marcron Output folder: output/marcron [06/08 23:45:04] Found transforms_train.json file, assuming Blender data set! [06/08 23:45:05] Reading Training Transforms [06/08 23:45:05] 7938it [00:01, 4123.37it/s] 5159it [01:35, 2.41s/it]scripts/train_xx.sh: line 9: 70902 Killed python train_face.py -s $dataset -m $workspace --init_num 2000 --densify_grad_threshold 0.0005 --audio_extractor $audio_extractor Optimizing output/marcron Output folder: output/marcron [06/08 23:47:08] Found transforms_train.json file, assuming Blender data set! [06/08 23:47:09] Reading Training Transforms [06/08 23:47:09] 7938it [00:01, 4124.96it/s] 5198it [01:32, 1.05it/s]scripts/train_xx.sh: line 10: 71034 Killed python train_fuse.py -s $dataset -m $workspace --opacity_lr 0.001 --audio_extractor $audio_extractor Looking for config file in output/marcron/cfg_args Config file found: output/marcron/cfg_args Rendering output/marcron Found transforms_train.json file, assuming Blender data set! [06/08 23:48:45] Reading Test Transforms [06/08 23:48:45] 794it [00:00, 3958.49it/s] 794it [00:09, 83.91it/s] Generating random point cloud (10000)... [06/08 23:48:55] Loading Training Cameras [06/08 23:48:55] Loading Test Cameras [06/08 23:48:56] Number of points at initialisation : 10000 [06/08 23:48:57] Traceback (most recent call last): File "synthesize_fuse.py", line 125, in
render_sets(model.extract(args), args.iteration, pipeline.extract(args), args.use_train, args.fast, args.dilate)
File "synthesize_fuse.py", line 93, in render_sets
(model_params, motion_params, model_mouth_params, motion_mouth_params) = torch.load(os.path.join(dataset.model_path, "chkpnt_fuse_latest.pth"))
File "/home/min/anaconda3/envs/talking_gaussian/lib/python3.7/site-packages/torch/serialization.py", line 699, in load
with _open_file_like(f, 'rb') as opened_file:
File "/home/min/anaconda3/envs/talking_gaussian/lib/python3.7/site-packages/torch/serialization.py", line 230, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/home/min/anaconda3/envs/talking_gaussian/lib/python3.7/site-packages/torch/serialization.py", line 211, in init
super(_open_file, self).init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'output/marcron/chkpnt_fuse_latest.pth'
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
/home/min/anaconda3/envs/talking_gaussian/lib/python3.7/site-packages/torchvision/models/_utils.py:209: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead.
f"The parameter '{pretrained_param}' is deprecated since 0.13 and will be removed in 0.15, "
/home/min/anaconda3/envs/talking_gaussian/lib/python3.7/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or
print(lmd_meter.report())
File "metrics.py", line 102, in report
return f'LMD ({self.backend}) = {self.measure():.6f}'
File "metrics.py", line 96, in measure
return self.V / self.N
ZeroDivisionError: division by zero
None
for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passingweights=AlexNet_Weights.IMAGENET1K_V1
. You can also useweights=AlexNet_Weights.DEFAULT
to get the most up-to-date weights. warnings.warn(msg) Loading model from: /home/min/anaconda3/envs/talking_gaussian/lib/python3.7/site-packages/lpips/weights/v0.1/alex.pth Traceback (most recent call last): File "metrics.py", line 215, in