如何medicalseg的nnunet训练集添加更多的数据

stillfighter2 commented 1 year ago

问题确认 Search before asking

[X] 我已经搜索过问题，但是没有找到解答。I have searched the question and found no related answer.

请提出你的问题 Please ask your question

由于使用160张图像组成的数据集在nnunet上训练的DSC指标过低,所以我想进行数据增强,目前我们按照相同的格式,为imagesTr和labelsTr文件夹都添加了60张数据增强后的图像,按照教程进行预处理(运行train.py文件得到decathlon、preprocessed、cropped这三个文件夹的过程中,并没有修改data/raw_data路径下的dataset.json文件内容),其中decathlon文件夹下的dataset.json文件显示traindata的数据还是160,请问训练的时候会有影响么

shiyutang commented 1 year ago

建议同步修改，nnunet中很多参数的配置需要访问dataset.json确定。

huangguoqing commented 1 year ago

@stillfighter2 @shiyutang 请问你的训练环境是自己配置的吗，还是AIstudio在线的环境训练的。我目前在自己电脑上配置的，运行MSD数据集做分割时候，12G的显卡内存显示满了，我看batch_size已经设置成1了，gpu也没有被其他程序占用，不知道你有没有遇到这个问题改什么参数可以训练网络？报错如下：

loading dataset loading all case properties dataset split over! dataset mode: train, keys: ['lung_001' 'lung_003' 'lung_004' 'lung_005' 'lung_009' 'lung_014' 'lung_015' 'lung_016' 'lung_018' 'lung_020' 'lung_022' 'lung_023' 'lung_025' 'lung_026' 'lung_027' 'lung_028' 'lung_029' 'lung_031' 'lung_036' 'lung_037' 'lung_038' 'lung_043' 'lung_044' 'lung_045' 'lung_047' 'lung_049' 'lung_051' 'lung_053' 'lung_054' 'lung_055' 'lung_057' 'lung_058' 'lung_061' 'lung_062' 'lung_064' 'lung_069' 'lung_071' 'lung_073' 'lung_074' 'lung_075' 'lung_078' 'lung_080' 'lung_081' 'lung_083' 'lung_084' 'lung_086' 'lung_092' 'lung_093' 'lung_095' 'lung_096'] terminate called after throwing an instance of 'paddle::memory::allocation::BadAlloc' what():

C++ Traceback (most recent call last):

0 conv2d_ad_func(paddle::experimental::Tensor const&, paddle::experimental::Tensor const&, std::vector<int, std::allocator >, std::vector<int, std::allocator >, std::string, int, std::vector<int, std::allocator >, std::string, bool, int, bool) 1 conv2d_ad_func(paddle::experimental::Tensor const&, paddle::experimental::Tensor const&, std::vector<int, std::allocator >, std::vector<int, std::allocator >, std::string, int, std::vector<int, std::allocator >, std::string, bool, int, bool) 2 paddle::experimental::conv2d(paddle::experimental::Tensor const&, paddle::experimental::Tensor const&, std::vector<int, std::allocator > const&, std::vector<int, std::allocator > const&, std::string const&, int, std::vector<int, std::allocator > const&, std::string const&, bool, int, bool) 3 void phi::ConvCudnnKernel<phi::dtype::float16, phi::GPUContext>(phi::GPUContext const&, phi::DenseTensor const&, phi::DenseTensor const&, std::vector<int, std::allocator > const&, std::vector<int, std::allocator > const&, std::string const&, int, std::vector<int, std::allocator > const&, std::string const&, bool, int, bool, phi::DenseTensor) 4 phi::DnnWorkspaceHandle::ReallocWorkspace(unsigned long) 5 paddle::memory::allocation::Allocator::Allocate(unsigned long) 6 paddle::memory::allocation::StatAllocator::AllocateImpl(unsigned long) 7 paddle::memory::allocation::Allocator::Allocate(unsigned long) 8 paddle::memory::allocation::Allocator::Allocate(unsigned long) 9 paddle::memory::allocation::Allocator::Allocate(unsigned long) 10 paddle::memory::allocation::CUDAAllocator::AllocateImpl(unsigned long) 11 std::string phi::enforce::GetCompleteTraceBackString(std::string&&, char const, int) 12 phi::enforce::GetCurrentTraceBackStringabi:cxx11

Error Message Summary:

ResourceExhaustedError:

Out of memory error on GPU 0. Cannot allocate 384.017822MB memory on GPU 0, 11.543640GB memory has been allocated and available memory is only 198.625000MB.

Please check whether there is any other process using GPU 0.

If yes, please stop them, or start PaddlePaddle on another GPU.
If no, please decrease the batch size of your model. If the above ways do not solve the out of memory problem, you can try to use CUDA managed memory. The command is export FLAGS_use_cuda_managed_memory=false. (at /paddle/paddle/fluid/memory/allocation/cuda_allocator.cc:95)

shiyutang commented 1 year ago

nnUNet的显存占用较大，可以前往AIStudio上进行训练，并且我们也提供了一个相关教程：https://aistudio.baidu.com/aistudio/projectdetail/5173243?channelType=0&channel=0

PaddlePaddle / PaddleSeg