PaddleCV-SIG / MedicalSeg

MedicalSeg is an easy-to-use 3D medical image segmentation toolkit that supports the whole segmentation process. Specially, We provide data preprocessing acceleration, high precision model on COVID-19 CT scans dataset and MRISpineSeg spine dataset, and a 3D visualization demo based on itkwidgets.
Apache License 2.0
66 stars 14 forks source link

请问为什么运行run-vnet.sh后,没有保存训练模型到best_model #71

Open Lipanw opened 2 years ago

Lipanw commented 2 years ago

请问为什么运行run-vnet.sh后,没有保存训练模型到best_model。train.log也没有任何内容

Lipanw commented 2 years ago

我是在windows10系统运行的

shiyutang commented 2 years ago

请问你运行了多长时间呢,有其他信息么~

Lipanw commented 2 years ago

Traceback (most recent call last): File "train.py", line 204, in main(args) File "train.py", line 198, in main to_static_training=cfg.to_static_training) File "F:\fuxianCode\MedicalSeg-develop\medicalseg\core\train.py", line 233, in train save_dir=save_dir) File "F:\fuxianCode\MedicalSeg-develop\medicalseg\core\val.py", line 151, in evaluate 'format': "xyz" File "F:\fuxianCode\MedicalSeg-develop\medicalseg\utils\utils.py", line 244, in save_array img_itk_new = sitk.GetImageFromArray(val) File "D:\mysoftware\Anaconda\lib\site-packages\SimpleITK\extra.py", line 292, in GetImageFromArray id = _get_sitk_pixelid(z) File "D:\mysoftware\Anaconda\lib\site-packages\SimpleITK\extra.py", line 189, in _get_sitk_pixelid raise TypeError('dtype: {0} is not supported.'.format(numpy_array_type.dtype)) TypeError: dtype: int32 is not supported.

Lipanw commented 2 years ago

2022-04-27 17:47:02 [INFO] [TRAIN] epoch: 0, iter: 100/15000, loss: 2.4868, DSC: 4.1360, lr: 0.009941, batch_cost: 0.7021, reader_cost: 0.00082, ips: 1.4244 samples/sec | ETA 02:54:20 2022-04-27 17:48:13 [INFO] [TRAIN] epoch: 1, iter: 200/15000, loss: 1.1843, DSC: 4.3465, lr: 0.009881, batch_cost: 0.7081, reader_cost: 0.00062, ips: 1.4123 samples/sec | ETA 02:54:39 2022-04-27 17:49:24 [INFO] [TRAIN] epoch: 2, iter: 300/15000, loss: 1.1282, DSC: 4.3768, lr: 0.009820, batch_cost: 0.7096, reader_cost: 0.00016, ips: 1.4092 samples/sec | ETA 02:53:51 2022-04-27 17:50:35 [INFO] [TRAIN] epoch: 2, iter: 400/15000, loss: 1.1043, DSC: 4.3364, lr: 0.009760, batch_cost: 0.7107, reader_cost: 0.00047, ips: 1.4071 samples/sec | ETA 02:52:56 2022-04-27 17:51:46 [INFO] [TRAIN] epoch: 3, iter: 500/15000, loss: 1.0901, DSC: 4.3506, lr: 0.009700, batch_cost: 0.7109, reader_cost: 0.00047, ips: 1.4066 samples/sec | ETA 02:51:48 2022-04-27 17:51:46 [INFO] Start evaluating (total_samples: 5, total_iters: 5)...

Lipanw commented 2 years ago

每次都是运行到500,要进行模型评估的时候就停止运行了

linhandev commented 2 years ago

听起来是验证的时候有点问题,在issue之后我们代码有更新,可以pull一下,save_interval开小一点尝试一下

shiyutang commented 2 years ago

这部分是在评估过程中保存存在问题,你可以先注释掉save_array部分开始训练,然后在这附上完整的可复现代码链接/修改的部分说明。

Lipanw commented 2 years ago

2022-05-06 14:56:46 [INFO] [TRAIN] epoch: 4, iter: 100/15000, loss: 4.4847, DSC: 3.7124, lr: 0.000994, batch_cost: 6.5770, reader_cost: 2.26782, ips: 0.9123 samples/sec | ETA 27:13:17 您好,之前的问题已经解决,但是相对于您在首页给的lr=0.001的例子DSC为什么这么低呢,loss也很高

Lipanw commented 2 years ago

以下是我的配置信息 ------------Environment Information------------- platform: Linux-4.15.0-158-generic-x86_64-with-debian-stretch-sid Python: 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0] Paddle compiled with cuda: True NVCC: Build cuda_11.2.r11.2/compiler.29618528_0 cudnn: 8.2 GPUs used: 1 CUDA_VISIBLE_DEVICES: None GPU: ['GPU 0: A100-SXM4-40GB (UUID:'] GCC: gcc (Ubuntu 7.5.0-3ubuntu1~16.04) 7.5.0 PaddlePaddle: 2.2.2

2022-05-06 14:45:45 [INFO]
---------------Config Information--------------- batch_size: 6 data_root: tools/data iters: 15000 loss: coef:

linhandev commented 2 years ago

lr可能可以适当大一点

shiyutang commented 2 years ago

一个问题可以只开一个issue。 另外看上去是数据的问题,是否有修改数据处理部分的代码呢?或者罗列下你都进行了什么修改?