Closed cheniison closed 4 years ago
将配置文件中device属性改成了"cpu"后,运行命令:
python train.py conf/train.json
仍然报错:
Use dataset to generate dict. Size of doc_label dict is 3 Size of doc_token dict is 2629 Size of doc_char dict is 2629 Size of doc_token_ngram dict is 0 Size of doc_keyword dict is 0 Size of doc_topic dict is 0 Shrink dict over. Size of doc_label dict is 3 Size of doc_token dict is 2396 Size of doc_char dict is 2396 Size of doc_token_ngram dict is 0 Size of doc_keyword dict is 0 Size of doc_topic dict is 0 THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1544176307774/work/aten/src/THC/THCCachingHostAllocator.cpp line=265 error=2 : out of memory Traceback (most recent call last): File "train.py", line 245, in train(config) File "train.py", line 212, in train trainer.train(train_data_loader, model, optimizer, "Train", epoch) File "train.py", line 101, in train ModeType.TRAIN) File "train.py", line 117, in run for batch in data_loader: File "/home/xxx/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 637, in next return self._process_next_batch(batch) File "/home/xxx/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch raise batch.exc_type(batch.exc_msg) RuntimeError: Traceback (most recent call last): File "/home/xxx/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 178, in _pin_memory_loop batch = pin_memory_batch(batch) File "/home/xxx/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 243, in pin_memory_batch return {k: pin_memory_batch(sample) for k, sample in batch.items()} File "/home/xxx/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 243, in return {k: pin_memory_batch(sample) for k, sample in batch.items()} File "/home/xxx/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 239, in pin_memory_batch return batch.pin_memory() RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1544176307774/work/aten/src/THC/THCCachingHostAllocator.cpp:265
使用的是自己生成的中文多标签数据集,其中每个字是一个token,其他配置没有改变,其中一些配置如下
{ "task_info":{ "label_type": "multi_label", "hierarchical": false, "hierar_taxonomy": "data/fdqb.taxonomy", "hierar_penalty": 0.000001 }, "device": "cpu", "model_name": "TextCNN", "checkpoint_dir": "checkpoint_dir_rcv1", "model_dir": "trained_model_rcv1", "data": { "train_json_files": [ "data/fdqb_train.json" ], "validate_json_files": [ "data/fdqb_dev.json" ], "test_json_files": [ "data/fdqb_test.json" ] ...
将配置文件中device属性改成了"cpu"后,运行命令:
python train.py conf/train.json
仍然报错:
Use dataset to generate dict. Size of doc_label dict is 3 Size of doc_token dict is 2629 Size of doc_char dict is 2629 Size of doc_token_ngram dict is 0 Size of doc_keyword dict is 0 Size of doc_topic dict is 0 Shrink dict over. Size of doc_label dict is 3 Size of doc_token dict is 2396 Size of doc_char dict is 2396 Size of doc_token_ngram dict is 0 Size of doc_keyword dict is 0 Size of doc_topic dict is 0 THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1544176307774/work/aten/src/THC/THCCachingHostAllocator.cpp line=265 error=2 : out of memory Traceback (most recent call last): File "train.py", line 245, in train(config) File "train.py", line 212, in train trainer.train(train_data_loader, model, optimizer, "Train", epoch) File "train.py", line 101, in train ModeType.TRAIN) File "train.py", line 117, in run for batch in data_loader: File "/home/xxx/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 637, in next return self._process_next_batch(batch) File "/home/xxx/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch raise batch.exc_type(batch.exc_msg) RuntimeError: Traceback (most recent call last): File "/home/xxx/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 178, in _pin_memory_loop batch = pin_memory_batch(batch) File "/home/xxx/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 243, in pin_memory_batch return {k: pin_memory_batch(sample) for k, sample in batch.items()} File "/home/xxx/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 243, in return {k: pin_memory_batch(sample) for k, sample in batch.items()} File "/home/xxx/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 239, in pin_memory_batch return batch.pin_memory() RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1544176307774/work/aten/src/THC/THCCachingHostAllocator.cpp:265
使用的是自己生成的中文多标签数据集,其中每个字是一个token,其他配置没有改变,其中一些配置如下
{ "task_info":{ "label_type": "multi_label", "hierarchical": false, "hierar_taxonomy": "data/fdqb.taxonomy", "hierar_penalty": 0.000001 }, "device": "cpu", "model_name": "TextCNN", "checkpoint_dir": "checkpoint_dir_rcv1", "model_dir": "trained_model_rcv1", "data": { "train_json_files": [ "data/fdqb_train.json" ], "validate_json_files": [ "data/fdqb_dev.json" ], "test_json_files": [ "data/fdqb_test.json" ] ...
请将 visible_device_list 设置为空,默认读取第1块GPU
将配置文件中device属性改成了"cpu"后,运行命令:
python train.py conf/train.json
仍然报错:
使用的是自己生成的中文多标签数据集,其中每个字是一个token,其他配置没有改变,其中一些配置如下