VIPL-Audio-Visual-Speech-Understanding / Lipreading-DenseNet3D

DenseNet3D Model In "LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild", https://arxiv.org/abs/1810.06990
117 stars 21 forks source link

请问CUDA out of memory && ValueError: pic should be 2/3 dimensional. Got 0 dimensions.出错 #4

Closed PingYufeng closed 5 years ago

PingYufeng commented 5 years ago

一、训练时,报CUDA out of memory,把batchsize改小后可以进入训练,但又出现ValueError: pic should be 2/3 dimensional. Got 0 dimensions的问题,请见下面的 二

Loading options... Running cudnn benchmark... options: {'title': 'LipReading PyTorch', 'general': {'usecudnn': True, 'usecudnnbenchmark': True, 'gpuid': '0', 'loadpretrainedmodel': True, 'random_seed': 55, 'pretrainedmodelpath': 'weights/lrw1000_34.pt'}, 'input': {'batchsize': 16, 'numworkers': 8, 'shuffle': True}, 'model': {'type': 'Finetune-label', 'inputdim': 256, 'hiddendim': 256, 'numclasses': 1000, 'numlstms': 2}, 'training': {'train': True, 'epochs': 1, 'startepoch': 0, 'statsfrequency': 100, 'data_root': 'LRW1000_Public/images', 'index_root': 'LRW1000_Public/info/trn_1000.txt', 'padding': 30, 'learningrate': 0.001, 'momentum': 0.9, 'weightdecay': 0.003, 'save_prefix': 'weights/tv_word_2000'}, 'validation': {'validate': True, 'data_root': 'LRW1000_Public/images', 'index_root': 'LRW1000_Public/info/val_1000.txt', 'padding': 60, 'saveaccuracy': True}, 'test': {'test': False, 'data_root': 'LRW1000_Public/images', 'index_root': 'LRW1000_Public/info/tst_1000.txt', 'padding': 60, 'saveaccuracy': True}} matched keys: 431 index file: LRW1000_Public/info/trn_1000.txt num of pinyins: 1000 num of data: 603097 max video length 30 index file: LRW1000_Public/info/val_1000.txt num of pinyins: 1000 num of data: 63237 max video length 51 options["training"]["startepoch"], options["training"]["epochs"]: 0 1 Starting training... Process Process-1: Traceback (most recent call last): File "D:\Anaconda3\envs\py36\lib\multiprocessing\process.py", line 258, in _bootstrap self.run() File "D:\Anaconda3\envs\py36\lib\multiprocessing\process.py", line 93, in run self._target(*self._args, self._kwargs) File "E:\Git_Lip\Lipreading-DenseNet3D\main.py", line 119, in run trainer(model, epoch) File "E:\Git_Lip\Lipreading-DenseNet3D\training.py", line 92, in call outputs = net(input) File "D:\Anaconda3\envs\py36\lib\site-packages\torch\nn\modules\module.py", line 547, in call result = self.forward(*input, *kwargs) File "D:\Anaconda3\envs\py36\lib\site-packages\torch\nn\parallel\data_parallel.py", line 150, in forward return self.module(inputs[0], kwargs[0]) File "D:\Anaconda3\envs\py36\lib\site-packages\torch\nn\modules\module.py", line 547, in call result = self.forward(*input, kwargs) File "E:\Git_Lip\Lipreading-DenseNet3D\models\Dense3D.py", line 148, in forward f2 = self.features(x) File "D:\Anaconda3\envs\py36\lib\site-packages\torch\nn\modules\module.py", line 547, in call result = self.forward(*input, *kwargs) File "D:\Anaconda3\envs\py36\lib\site-packages\torch\nn\modules\container.py", line 92, in forward input = module(input) File "D:\Anaconda3\envs\py36\lib\site-packages\torch\nn\modules\module.py", line 547, in call result = self.forward(input, kwargs) File "D:\Anaconda3\envs\py36\lib\site-packages\torch\nn\modules\container.py", line 92, in forward input = module(input) File "D:\Anaconda3\envs\py36\lib\site-packages\torch\nn\modules\module.py", line 547, in call result = self.forward(*input, *kwargs) File "E:\Git_Lip\Lipreading-DenseNet3D\models\Dense3D.py", line 79, in forward new_features = super(_DenseLayer, self).forward(x) File "D:\Anaconda3\envs\py36\lib\site-packages\torch\nn\modules\container.py", line 92, in forward input = module(input) File "D:\Anaconda3\envs\py36\lib\site-packages\torch\nn\modules\module.py", line 547, in call result = self.forward(input, **kwargs) File "D:\Anaconda3\envs\py36\lib\site-packages\torch\nn\modules\conv.py", line 478, in forward self.padding, self.dilation, self.groups) RuntimeError: CUDA out of memory. Tried to allocate 1.32 GiB (GPU 0; 11.00 GiB total capacity; 6.81 GiB already allocated; 1.60 GiB free; 25.92 MiB cached)

二、把bacth_size改成8后可以训练,报ValueError: pic should be 2/3 dimensional. Got 0 dimensions.

(py36) E:\Git_Lip\Lipreading-DenseNet3D>python main.py options_lip.toml Loading options... Running cudnn benchmark... options: {'title': 'LipReading PyTorch', 'general': {'usecudnn': True, 'usecudnnbenchmark': True, 'gpuid': '0', 'loadpretrainedmodel': True, 'random_seed': 55, 'pretrainedmodelpath': 'weights/lrw1000_34.pt'}, 'input': {'batchsize': 8, 'numworkers': 8, 'shuffle': True}, 'model': {'type': 'Finetune-label', 'inputdim': 256, 'hiddendim': 256, 'numclasses': 1000, 'numlstms': 2}, 'training': {'train': True, 'epochs': 1, 'startepoch': 0, 'statsfrequency': 100, 'data_root': 'LRW1000_Public/images', 'index_root': 'LRW1000_Public/info/trn_1000.txt', 'padding': 30, 'learningrate': 0.001, 'momentum': 0.9, 'weightdecay': 0.003, 'save_prefix': 'weights/tv_word_2000'}, 'validation': {'validate': True, 'data_root': 'LRW1000_Public/images', 'index_root': 'LRW1000_Public/info/val_1000.txt', 'padding': 60, 'saveaccuracy': True}, 'test': {'test': False, 'data_root': 'LRW1000_Public/images', 'index_root': 'LRW1000_Public/info/tst_1000.txt', 'padding': 60, 'saveaccuracy': True}} matched keys: 431 index file: LRW1000_Public/info/trn_1000.txt num of pinyins: 1000 num of data: 603097 max video length 30 index file: LRW1000_Public/info/val_1000.txt num of pinyins: 1000 num of data: 63237 max video length 51 options["training"]["startepoch"], options["training"]["epochs"]: 0 1 Starting training... Iteration: 00000000,Elapsed Time: 00 hrs, 00 mins, 13 secs,Estimated Time Remaining: 2277 hrs, 40 mins, 42 secs,Loss:3.082460403442383 Iteration: 00000200,Elapsed Time: 00 hrs, 00 mins, 22 secs,Estimated Time Remaining: 18 hrs, 47 mins, 20 secs,Loss:4.493927001953125 Iteration: 00000400,Elapsed Time: 00 hrs, 00 mins, 31 secs,Estimated Time Remaining: 13 hrs, 07 mins, 18 secs,Loss:4.264379501342773 Iteration: 00000600,Elapsed Time: 00 hrs, 00 mins, 40 secs,Estimated Time Remaining: 11 hrs, 16 mins, 48 secs,Loss:5.106860160827637 Iteration: 00000800,Elapsed Time: 00 hrs, 00 mins, 49 secs,Estimated Time Remaining: 10 hrs, 22 mins, 03 secs,Loss:6.046342849731445 Process Process-1: Traceback (most recent call last): File "D:\Anaconda3\envs\py36\lib\multiprocessing\process.py", line 258, in _bootstrap self.run() File "D:\Anaconda3\envs\py36\lib\multiprocessing\process.py", line 93, in run self._target(*self._args, **self._kwargs) File "E:\Git_Lip\Lipreading-DenseNet3D\main.py", line 119, in run trainer(model, epoch) File "E:\Git_Lip\Lipreading-DenseNet3D\training.py", line 83, in call for i_batch, sample_batched in enumerate(self.trainingdataloader): File "D:\Anaconda3\envs\py36\lib\site-packages\torch\utils\data\dataloader.py", line 819, in next return self._process_data(data) File "D:\Anaconda3\envs\py36\lib\site-packages\torch\utils\data\dataloader.py", line 846, in _process_data data.reraise() File "D:\Anaconda3\envs\py36\lib\site-packages\torch_utils.py", line 369, in reraise raise self.exc_type(msg) ValueError: Caught ValueError in DataLoader worker process 3. Original Traceback (most recent call last): File "D:\Anaconda3\envs\py36\lib\site-packages\torch\utils\data_utils\worker.py", line 178, in _worker_loop data = fetcher.fetch(index) File "D:\Anaconda3\envs\py36\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "D:\Anaconda3\envs\py36\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "E:\Git_Lip\Lipreading-DenseNet3D\data\dataset.py", line 42, in getitem temporalvolume = bbc(vidframes, self.padding, self.augment) File "E:\Git_Lip\Lipreading-DenseNet3D\data\preprocess.py", line 41, in bbc ])(vidframes[i]) File "D:\Anaconda3\envs\py36\lib\site-packages\torchvision\transforms\transforms.py", line 61, in call img = t(img) File "D:\Anaconda3\envs\py36\lib\site-packages\torchvision\transforms\transforms.py", line 127, in call return F.to_pil_image(pic, self.mode) File "D:\Anaconda3\envs\py36\lib\site-packages\torchvision\transforms\functional.py", line 131, in to_pil_image raise ValueError('pic should be 2/3 dimensional. Got {} dimensions.'.format(pic.ndim)) ValueError: pic should be 2/3 dimensional. Got 0 dimensions.

Fengdalu commented 5 years ago

Please check the downloaded dataset, if there are any problems, please email me.

PingYufeng commented 5 years ago

您好,目前遇到了有待解决几个问题和想请教您: 1、使用中科院杨双博士提供的数据集LRW-1000在解压时出现了错误,part.tar.01部分数据损失。另有检查出5张图片损坏,请见附件。

2、我取其中的部分(从‘对’到‘展开’的标签)训练集、验证集、测试集进行成功训练时,发现得出的结果达不到预期期望。 参数配置如下: title = "LipReading PyTorch" [general] usecudnn = true usecudnnbenchmark = true gpuid = "0" loadpretrainedmodel = true random_seed = 55 pretrainedmodelpath = 'weights/lrw1000_34.pt'

[input] batchsize = 16 numworkers = 8 shuffle = true

[model] type = "Finetune-label" inputdim = 256 hiddendim = 256 numclasses = 1000 numlstms = 2

[training] train = true epochs = 10 startepoch = 0 statsfrequency = 100 data_root = 'LRW1000_Public/images' index_root = 'LRW1000_Public/lables/trn/[dui_to_zhankai)Trn.txt' padding = 30 learningrate = 1e-3 momentum = 0.9 weightdecay = 0.003 save_prefix = "weights/tv_word_100"

[validation] validate = false data_root = 'LRW1000_Public/images' index_root = 'LRW1000_Public/lables/val/[dui_to_zhankai)Val.txt' padding = 60 saveaccuracy = true

[test] test = false data_root = 'LRW1000_Public/images' index_root = 'LRW1000_Public/lables/tst/[dui_to_zhankai)Tst.txt' padding = 60 saveaccuracy = true

训练结果:loss已近平滑参数为1了。但感觉有点问题。

测试结果: count[0:]:[ 669. 1289. 1650. 1918. 2094. 2243. 2374. 2493. 2629. 2760. 2899. 3047.

                      1. 4080.
                      1. 4680.
                      1. 5032.
                      1. 5390.
                      1. 5678.
                      1. 5984.
                      1. 6481.
                      1. 6957.
                      1. 7258.
                      1. 7562.
                      1. 7843.
                      1. 7977.
                      1. 8234.
                      1. 8379.
                      1. 8594.
                      1. 8742.
                      1. 8887.
                      1. 9068.
                      1. 9201.
                      1. 9371.
            1. 9445.] input.size(0): 13, num_samples: 9453 i_batch/tot_batch:590/591,corret/tot:669.0/9453,current_acc:0.0707711837511901

              Top # | Accuracy 0 |0.0707711837511901 1 |0.1363588278853274 2 |0.17454776261504285 3 |0.2028985507246377 4 |0.22151697873690893

3、在统计准确率时,不是很明白count[p:]的原理。请问能否简单阐述下该模型测试评估准确率的方法吗 validation.py for i in range(input.size(0)): p = list(argmax[i]).index(labels[i]) count[p:] += 1

期待您的解答,感谢。

You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

------------------ 原始邮件 ------------------ 发件人: "Dalu Feng"notifications@github.com; 发送时间: 2019年10月16日(星期三) 中午11:41 收件人: "Fengdalu/Lipreading-DenseNet3D"Lipreading-DenseNet3D@noreply.github.com; 抄送: "沉忌"2554550408@qq.com;"Author"author@noreply.github.com; 主题: Re: [Fengdalu/Lipreading-DenseNet3D] 请问CUDA out of memory && ValueError: pic should be 2/3 dimensional. Got 0 dimensions.出错 (#4)

Closed #4.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.