Open alanOO7 opened 1 month ago
您的问题已收到,您的显存是多大呢?PaddleX是哪个分支呢?
您的问题已收到,您的显存是多大呢?PaddleX是哪个分支呢?
a2000,12g,,版本是3.0-beta1版本
收到,该问题已确认且已修复,可以使用最新的paddle版本,如在CUDA11.8上安装,安装的命令可以是python -m pip install --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu118/
,更多的安装方式可以参考paddle官方文档,修复的paddle版本马上会发布~
收到,该问题已确认且已修复,可以使用最新的paddle版本,如在CUDA11.8上安装,安装的命令可以是
python -m pip install --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu118/
,更多的安装方式可以参考paddle官方文档,修复的paddle版本马上会发布~
还没发布吗,还是溢出
您好,现在paddle 3.0beta2已经发布了,可以安装这个尝试python -m pip install paddlepaddle-gpu==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
描述问题
复现
eval model:: 0%| | 0/61 [00:00<?, ?it/s] eval model:: 2%|▏ | 1/61 [00:00<00:06, 9.31it/s] eval model:: 15%|█▍ | 9/61 [00:00<00:01, 48.09it/s] eval model:: 28%|██▊ | 17/61 [00:00<00:00, 60.59it/s] eval model:: 41%|████ | 25/61 [00:00<00:00, 66.46it/s] eval model:: 54%|█████▍ | 33/61 [00:00<00:00, 69.66it/s] eval model:: 67%|██████▋ | 41/61 [00:00<00:00, 71.53it/s] eval model:: 80%|████████ | 49/61 [00:00<00:00, 72.67it/s] eval model:: 93%|█████████▎| 57/61 [00:00<00:00, 73.25it/s] eval model:: 100%|██████████| 61/61 [00:01<00:00, 49.46it/s] [2024/10/10 23:55:13] ppocr INFO: cur metric, acc: 0.8647540629199154, norm_edit_dis: 0.9783078684382168, fps: 327.1838605393461 [2024/10/10 23:55:13] ppocr INFO: best metric, acc: 0.9016393073098644, is_float16: False, norm_edit_dis: 0.9640742923909182, fps: 330.00967898380986, best_epoch: 1 [2024/10/10 23:55:15] ppocr INFO: inference model is saved to /home/mcn/PaddleX/output/cme/latest/inference/inference [2024/10/10 23:55:15] ppocr INFO: Export inference config file to /home/mcn/PaddleX/output/cme/latest/inference/inference.yml [2024/10/10 23:55:16] ppocr INFO: Already save model info in /home/mcn/PaddleX/output/cme/latest [2024/10/10 23:55:16] ppocr INFO: save model in /home/mcn/PaddleX/output/cme/latest/latest [2024/10/10 23:55:17] ppocr INFO: inference model is saved to /home/mcn/PaddleX/output/cme/iter_epoch_34/inference/inference [2024/10/10 23:55:17] ppocr INFO: Export inference config file to /home/mcn/PaddleX/output/cme/iter_epoch_34/inference/inference.yml [2024/10/10 23:55:18] ppocr INFO: Already save model info in /home/mcn/PaddleX/output/cme/iter_epoch_34 [2024/10/10 23:55:18] ppocr INFO: save model in /home/mcn/PaddleX/output/cme/iter_epoch_34/iter_epoch_34 Traceback (most recent call last): File "/home/mcn/PaddleX/paddlex/repo_manager/repos/PaddleOCR/tools/train.py", line 264, in
main(config, device, logger, vdl_writer, seed)
File "/home/mcn/PaddleX/paddlex/repo_manager/repos/PaddleOCR/tools/train.py", line 217, in main
program.train(
File "/home/mcn/PaddleX/paddlex/repo_manager/repos/PaddleOCR/tools/program.py", line 344, in train
preds = model(images, data=batch[1:])
File "/root/anaconda3/envs/pdx/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1426, in call
return self.forward(*inputs, kwargs)
File "/home/mcn/PaddleX/paddlex/repo_manager/repos/PaddleOCR/ppocr/modeling/architectures/base_model.py", line 85, in forward
x = self.backbone(x)
File "/root/anaconda3/envs/pdx/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1426, in call
return self.forward(*inputs, *kwargs)
File "/home/mcn/PaddleX/paddlex/repo_manager/repos/PaddleOCR/ppocr/modeling/backbones/rec_hgnet.py", line 287, in forward
x = stage(x)
File "/root/anaconda3/envs/pdx/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1426, in call
return self.forward(inputs, kwargs)
File "/home/mcn/PaddleX/paddlex/repo_manager/repos/PaddleOCR/ppocr/modeling/backbones/rec_hgnet.py", line 191, in forward
x = self.blocks(x)
File "/root/anaconda3/envs/pdx/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1426, in call
return self.forward(*inputs, kwargs)
File "/root/anaconda3/envs/pdx/lib/python3.10/site-packages/paddle/nn/layer/container.py", line 615, in forward
input = layer(input)
File "/root/anaconda3/envs/pdx/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1426, in call
return self.forward(*inputs, *kwargs)
File "/home/mcn/PaddleX/paddlex/repo_manager/repos/PaddleOCR/ppocr/modeling/backbones/rec_hgnet.py", line 147, in forward
x = self.att(x)
File "/root/anaconda3/envs/pdx/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1426, in call
return self.forward(inputs, kwargs)
File "/home/mcn/PaddleX/paddlex/repo_manager/repos/PaddleOCR/ppocr/modeling/backbones/rec_hgnet.py", line 92, in forward
x = self.conv(x)
File "/root/anaconda3/envs/pdx/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1426, in call
return self.forward(*inputs, **kwargs)
File "/root/anaconda3/envs/pdx/lib/python3.10/site-packages/paddle/nn/layer/conv.py", line 711, in forward
out = F.conv._conv_nd(
File "/root/anaconda3/envs/pdx/lib/python3.10/site-packages/paddle/nn/functional/conv.py", line 127, in _conv_nd
pre_bias = _C_ops.conv2d(
MemoryError:
C++ Traceback (most recent call last):
0 paddle::pybind::eager_api_conv2d(_object, _object, _object) 1 conv2d_ad_func(paddle::Tensor const&, paddle::Tensor const&, std::vector<int, std::allocator >, std::vector<int, std::allocator >, std::string, std::vector<int, std::allocator >, int, std::string)
2 paddle::experimental::conv2d(paddle::Tensor const&, paddle::Tensor const&, std::vector<int, std::allocator > const&, std::vector<int, std::allocator > const&, std::string const&, std::vector<int, std::allocator > const&, int, std::string const&)
3 void phi::ConvCudnnKernel<float, phi::GPUContext>(phi::GPUContext const&, phi::DenseTensor const&, phi::DenseTensor const&, std::vector<int, std::allocator > const&, std::vector<int, std::allocator > const&, std::string const&, std::vector<int, std::allocator > const&, int, std::string const&, phi::DenseTensor )
4 void phi::ConvCudnnKernelImplV7<float, phi::GPUContext>(phi::DenseTensor const, phi::DenseTensor const, phi::GPUContext const&, std::vector<int, std::allocator > const&, std::vector<int, std::allocator > const&, std::vector<int, std::allocator > const&, phi::backends::gpu::DataLayout, phi::backends::gpu::DataLayout, bool, bool, int, phi::DenseTensor)
5 phi::DnnWorkspaceHandle::ReallocWorkspace(unsigned long)
6 paddle::memory::allocation::Allocator::Allocate(unsigned long)
7 paddle::memory::allocation::StatAllocator::AllocateImpl(unsigned long)
8 paddle::memory::allocation::Allocator::Allocate(unsigned long)
9 paddle::memory::allocation::Allocator::Allocate(unsigned long)
10 paddle::memory::allocation::Allocator::Allocate(unsigned long)
11 paddle::memory::allocation::Allocator::Allocate(unsigned long)
12 paddle::memory::allocation::CUDAAllocator::AllocateImpl(unsigned long)
13 std::string phi::enforce::GetCompleteTraceBackString(std::string&&, char const , int)
14 common::enforce::GetCurrentTraceBackStringabi:cxx11
Error Message Summary:
ResourceExhaustedError:
Out of memory error on GPU 0. Cannot allocate 128.000000MB memory on GPU 0, 11.664124GB memory has been allocated and available memory is only 88.187500MB.
Please check whether there is any other process using GPU 0.
环境