bowang-lab / U-Mamba

U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation
https://arxiv.org/abs/2401.04722
Apache License 2.0
652 stars 60 forks source link

GPU memory #14

Closed innocence0206 closed 8 months ago

innocence0206 commented 8 months ago

Hello!Your work is great! I want to know that is a 24G NVIDIA Geforce RTX 3090 GPU enough to run all the experiments? I encounter the OOM problem.

DINGdef commented 7 months ago

你好同学,请问一下这个3090是不是完全跑不了?

sxqhyc commented 6 months ago

我有相同的疑惑,以及如果3090不可行,4090是否可行

innocence0206 commented 6 months ago

3090可以跑,但需要完整的一张卡

sxqhyc commented 6 months ago

感谢告知,请问如果使用3090训练时长大概是多久呢

innocence0206 commented 6 months ago

我跑 UMambaBot abdomen CT 3D 大约115s一个epoch

sxqhyc commented 6 months ago

感谢您的回复

sxqhyc commented 6 months ago

您好,不好意思打扰您,我跑 UMambaBot abdomen CT 3D 时出现报错Traceback (most recent call last): File "/root/miniconda3/envs/umamba/bin/nnUNetv2_train", line 33, in sys.exit(load_entry_point('nnunetv2', 'console_scripts', 'nnUNetv2_train')()) File "/root/autodl-tmp/U-Mamba/umamba/nnunetv2/run/run_training.py", line 268, in run_training_entry run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights, File "/root/autodl-tmp/U-Mamba/umamba/nnunetv2/run/run_training.py", line 204, in run_training nnunet_trainer.run_training() File "/root/autodl-tmp/U-Mamba/umamba/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1258, in run_training train_outputs.append(self.train_step(next(self.dataloader_train))) File "/root/autodl-tmp/U-Mamba/umamba/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 900, in train_step output = self.network(data) File "/root/miniconda3/envs/umamba/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "/root/autodl-tmp/U-Mamba/umamba/nnunetv2/nets/UMambaBot.py", line 207, in forward out = self.mamba(middle_feature_flat) File "/root/miniconda3/envs/umamba/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/root/miniconda3/envs/umamba/lib/python3.10/site-packages/mamba_ssm/modules/mamba_simple.py", line 146, in forward out = mamba_inner_fn( File "/root/miniconda3/envs/umamba/lib/python3.10/site-packages/mamba_ssm/ops/selective_scan_interface.py", line 317, in mamba_inner_fn return MambaInnerFn.apply(xz, conv1d_weight, conv1d_bias, x_proj_weight, delta_proj_weight, File "/root/miniconda3/envs/umamba/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(args, kwargs) # type: ignore[misc] File "/root/miniconda3/envs/umamba/lib/python3.10/site-packages/torch/cuda/amp/autocast_mode.py", line 98, in decorate_fwd return fwd(*args, **kwargs) File "/root/miniconda3/envs/umamba/lib/python3.10/site-packages/mamba_ssm/ops/selective_scan_interface.py", line 187, in forward conv1d_out = causal_conv1d_cuda.causal_conv1d_fwd( TypeError: causal_conv1d_fwd(): incompatible function arguments. The following argument types are supported:

  1. (arg0: torch.Tensor, arg1: torch.Tensor, arg2: Optional[torch.Tensor], arg3: Optional[torch.Tensor], arg4: bool) -> torch.Tensor

Invoked with: tensor([[[-0.4309, 0.0259, -0.4727, ..., -0.2781, -0.1232, -0.3730], [-0.2781, -0.0997, -0.6050, ..., -0.2766, -0.6372, -0.3533], [-0.0618, 0.8125, -0.4060, ..., -0.1714, -0.4846, -0.0590], ..., [ 0.0829, 0.2800, 0.4446, ..., -0.7305, -0.3291, -0.6758], [-0.1943, -0.0635, 1.0850, ..., 1.4658, 1.4893, 1.3594], [ 0.2766, 1.0732, 0.0393, ..., -0.2764, -0.6304, -0.8667]],

    [[ 0.2411,  0.0118,  0.5518,  ...,  0.3296, -0.8071, -0.1635],
     [ 0.6743,  0.1592, -0.1962,  ...,  0.0081, -0.4353, -0.1405],
     [-0.0917,  0.7886,  0.0710,  ..., -0.1394, -0.9678, -0.1375],
     ...,
     [-0.1704, -0.1528,  0.7148,  ...,  0.0450, -0.7852, -0.8745],
     [ 0.6436,  0.6372,  0.4551,  ...,  1.4600,  0.9419,  0.5239],
     [ 0.3149,  0.4114,  0.3367,  ..., -0.3435, -0.5747, -0.7856]]],
   device='cuda:0', dtype=torch.float16, requires_grad=True), tensor([[ 0.3824,  0.2066, -0.3590,  0.0857],
    [ 0.3109, -0.0909, -0.4797,  0.1386],
    [ 0.4349, -0.1634,  0.3887,  0.0434],
    ...,
    [ 0.4660,  0.0241,  0.1350, -0.2433],
    [-0.0888,  0.3035,  0.0764,  0.1301],
    [ 0.2705, -0.2486,  0.4838, -0.0985]], device='cuda:0',
   requires_grad=True), Parameter containing:

tensor([ 4.7962e-01, 4.8479e-01, -2.5279e-01, -3.0827e-01, -3.3213e-01, -1.4288e-01, -4.5987e-01, 4.3878e-01, -1.5063e-03, -1.9597e-01, 1.4319e-01, 2.6169e-01, -1.7347e-01, 7.7068e-02, -4.0516e-01, 4.6599e-01, 1.9281e-02, 1.4166e-01, 1.8710e-01, 2.7384e-01, 2.0033e-01, -4.8457e-01, -1.1843e-01, -3.6279e-01, -7.8901e-02, 3.4482e-02, 3.8873e-01, -2.0474e-01, -1.2400e-02, -1.1579e-01, -1.9957e-01, 9.5242e-02, 1.5356e-01, -3.4117e-01, 3.6400e-01, -2.0204e-02, 2.1317e-01, 2.3487e-01, 2.0242e-01, -1.1255e-01, -3.1767e-01, 6.2158e-02, 2.6787e-01, -1.7242e-01, 3.2281e-02, 3.5580e-02, 2.6998e-01, -4.0724e-02, 3.7145e-01, -2.1433e-01, 4.9845e-02, 2.2799e-01, -3.9639e-01, -1.9488e-01, -3.5587e-01, -3.4614e-01, 1.4036e-01, -1.3031e-01, -2.2870e-01, 4.8570e-01, 3.2376e-01, -3.4532e-01, 4.0973e-01, -2.6478e-01, -2.3716e-01, -1.8571e-01, -2.4732e-01, -2.4669e-01, -4.3735e-01, -3.3737e-01, 3.6823e-01, 4.2479e-01, 4.8993e-01, 4.3231e-01, 4.0503e-01, 1.0974e-02, 3.5211e-01, -3.9347e-02, -3.9272e-01, 3.9859e-01, -4.3879e-01, 1.1274e-01, -2.4708e-01, -2.1775e-01, -3.1858e-01, 4.7040e-01, -3.0428e-01, -3.6425e-01, -4.0608e-01, -4.0493e-01, -3.6670e-01, -1.6068e-01, -1.5086e-01, -3.9675e-01, 2.0375e-01, -3.9259e-01, -2.2690e-01, 8.0192e-02, -5.3728e-02, 3.8359e-02, -3.0218e-01, 2.5141e-02, -2.6222e-01, -3.4578e-01, 4.1703e-01, 3.4669e-01, 1.4030e-01, -3.4173e-01, -2.1287e-01, -3.0722e-01, 2.6901e-01, 4.3156e-01, 1.9677e-01, -2.3181e-02, -4.6300e-01, 2.7719e-01, 1.0645e-01, -2.7175e-01, -1.9727e-01, -3.6466e-01, 3.8228e-01, -3.0307e-01, 1.9912e-01, 2.8042e-01, -3.2982e-01, -4.9260e-01, -2.6728e-01, 4.7245e-01, -4.2860e-02, -3.8904e-01, 8.3890e-02, 3.3922e-01, 1.4378e-01, -3.8792e-01, -1.6257e-01, 5.2445e-03, 9.3250e-02, 4.6449e-01, -4.8274e-01, -3.9466e-01, 8.8912e-02, -4.8302e-01, 2.4707e-01, 4.1222e-01, -2.5830e-01, -4.0399e-01, -8.6289e-02, 4.0012e-01, 1.5051e-01, 4.9645e-01, 3.9160e-01, 5.4804e-02, 3.0695e-01, 1.4465e-01, -1.0898e-01, 5.8964e-02, 1.1723e-01, 4.3697e-01, 1.4168e-01, 3.4829e-01, -4.0456e-01, 3.4746e-01, 2.5495e-01, -4.4825e-01, 5.0101e-02, 3.5747e-01, 4.5524e-01, -2.5208e-01, -7.0708e-02, 2.1429e-01, 1.1511e-01, 2.6056e-01, 7.9405e-02, -1.2963e-01, -2.6105e-01, 3.3566e-01, -4.5001e-02, -4.4204e-01, -1.7358e-01, -4.6689e-02, -2.9253e-01, 2.6490e-01, 1.2748e-01, -4.2641e-01, -1.9272e-01, -6.0015e-02, 4.0870e-01, 8.2099e-02, -4.8710e-01, 1.7549e-01, 9.5151e-02, -3.2689e-01, -3.0086e-01, 3.3018e-01, -3.6508e-01, -6.1753e-03, 9.9023e-02, 1.8249e-01, -4.6626e-02, 1.1730e-01, -4.1660e-01, -4.2668e-02, -1.6562e-01, -4.0187e-01, 3.1341e-02, -4.3040e-01, 3.1844e-01, -3.7103e-01, 4.3835e-01, 1.3132e-01, 2.5113e-01, -1.5656e-01, 3.0113e-01, -1.3201e-01, 3.7793e-01, -1.1079e-01, -2.6707e-02, -1.7200e-02, -4.5116e-01, -3.4246e-01, -2.0996e-01, -3.5461e-01, -3.7659e-02, 2.5668e-01, -2.9453e-01, -3.7982e-01, -1.1473e-01, 3.5894e-01, 1.7152e-01, 3.7714e-01, -5.3735e-02, -1.7037e-01, 4.9556e-01, 1.7471e-01, -4.9671e-01, 2.4128e-01, -3.4676e-02, -2.1347e-01, 1.2655e-01, -1.8278e-01, 4.9826e-01, 5.6089e-02, 1.7481e-01, -2.4452e-01, 2.6694e-02, -6.3505e-02, 6.5011e-02, 1.5366e-01, 2.2215e-01, -4.8083e-01, 1.1660e-01, 1.5295e-01, 4.4092e-01, -3.0475e-01, -4.9312e-01, 2.0703e-01, -3.9867e-02, -1.5779e-01, -1.6845e-01, -9.1024e-02, -3.9765e-01, -8.7451e-02, -2.9990e-01, -1.1314e-01, 1.2752e-01, -2.3455e-01, 2.5247e-01, 3.9602e-01, 4.8431e-01, 4.9729e-01, -2.2461e-02, 1.1501e-01, -4.1873e-01, -8.7960e-03, 3.3952e-01, 3.3574e-01, 1.1880e-01, 1.3549e-01, 2.9769e-01, -1.7123e-03, -2.4086e-01, -1.5897e-01, -1.6415e-01, -1.4279e-01, 1.0480e-02, -3.3306e-01, 4.7584e-01, 1.3665e-02, 1.4578e-02, 3.9674e-01, 1.3054e-01, 1.6986e-01, 2.8605e-01, -4.9228e-01, -7.0325e-02, 2.9715e-01, 9.7585e-02, -1.6786e-02, 8.3297e-02, -3.7793e-01, 3.0135e-01, 8.2653e-02, -3.7482e-01, 1.3600e-01, 1.0151e-01, -1.8526e-01, -1.9975e-01, -3.9446e-01, -4.6085e-01, -3.0774e-04, -2.5961e-01, -3.6751e-01, 1.5648e-01, 4.7004e-01, -4.3699e-01, 9.7898e-02, 2.1264e-03, 2.8668e-01, -1.9282e-01, 4.5377e-01, -1.4316e-01, 4.3265e-01, -1.9558e-02, -3.5399e-01, -1.2024e-01, 1.2218e-01, 3.2912e-01, 3.8517e-01, -4.0553e-01, -4.4340e-01, -4.3079e-01, 4.6752e-01, 4.1436e-01, 3.0286e-01, 1.2877e-02, 3.8312e-01, 9.1779e-02, 2.0205e-01, 2.6643e-01, -3.5545e-01, -3.1599e-01, -9.8497e-02, 4.6730e-01, -4.2203e-01, 8.2979e-02, 2.2627e-01, 2.3198e-01, -5.8854e-02, 4.6601e-01, 2.4186e-01, 7.4165e-02, 2.4635e-01, -4.3503e-01, -3.7626e-01, -3.0744e-01, 5.9249e-02, 4.4304e-01, 2.9486e-01, -1.7534e-01, -3.9102e-01, -3.3671e-02, -5.2782e-02, 3.5540e-01, -2.7880e-01, -3.3705e-01, -1.9167e-01, -4.7580e-01, -3.0708e-02, 4.7162e-01, -4.2262e-02, 1.5208e-01, 1.0470e-01, 4.7486e-01, 3.5389e-01, 2.6363e-01, -1.4774e-01, -2.8184e-01, 4.7354e-01, -1.3232e-01, -1.8386e-01, 1.5762e-01, -2.0357e-01, -9.0662e-02, -2.3358e-01, 1.7566e-01, -4.8818e-01, 1.6877e-01, 1.2035e-01, -3.7156e-01, 4.5751e-02, -3.0792e-02, -3.5186e-01, 4.1885e-02, 7.8687e-02, 3.6551e-02, -2.7650e-01, -1.3466e-01, -4.9134e-01, 1.1234e-01, 1.5127e-02, -3.1435e-01, 1.8003e-02, 4.4308e-01, 1.4189e-01, 3.5282e-01, -3.3235e-01, 4.7515e-01, -9.5606e-02, 2.3553e-01, -1.1071e-01, -4.0396e-01, -3.5232e-01, -2.4940e-02, 4.7493e-01, 1.7924e-01, 1.0880e-01, 3.0362e-02, 3.0638e-01, -6.0475e-02, 4.9317e-01, 1.1514e-01, 1.4032e-01, -1.7229e-01, -3.6092e-01, -2.7938e-01, 4.6423e-01, -4.7306e-01, 2.3984e-01, 4.5483e-01, -3.1906e-01, -4.8101e-01, -3.3243e-01, -5.6282e-02, 4.8489e-01, -2.4417e-01, -5.1842e-02, 3.4968e-01, 3.8380e-01, -3.8495e-01, 2.3101e-01, -1.5159e-01, 4.7766e-01, -4.2653e-01, 5.3804e-02, -3.9535e-01, -5.4005e-02, -6.5014e-02, -3.5616e-01, -3.9728e-01, 6.8200e-04, 3.3623e-01, -3.2197e-01, 1.9382e-01, 3.0068e-01, 3.3718e-01, -6.2512e-02, 3.0143e-01, 4.9067e-01, 2.7393e-01, -5.3442e-02, -7.4704e-02, 1.9932e-02, -4.8002e-01, -1.3367e-01, 1.6941e-01, 2.8051e-01, -3.8234e-01, -3.3470e-01, -3.8447e-01, -1.3884e-01, -4.0276e-01, -5.2763e-02, -9.0117e-03, 2.6956e-01, -4.8159e-02, 2.9178e-01, 4.2928e-01, -3.8596e-01, -2.3522e-01, 6.4796e-02, 4.8996e-01, -3.1418e-01, 1.1836e-01, -2.5484e-01, -2.8595e-01, -8.2885e-02, 4.4646e-01, 2.5157e-01, 4.8594e-01, 3.0306e-01, -4.4475e-01, -1.4232e-01, 3.9684e-01, 3.9130e-01, 2.0070e-01, 4.2089e-01, -1.3706e-01, -7.7699e-02, 9.2871e-02, -2.2580e-01, 4.8880e-02, -2.0443e-01, 2.4460e-01, 3.1691e-01, -3.4573e-01, -3.1402e-01, 2.4022e-01, -9.7548e-02, -1.0715e-01, -3.9855e-01, -3.5270e-01, -8.2221e-02, -1.2195e-01, 1.8677e-01, 3.6652e-01, 1.0880e-02, -1.3555e-01, 2.7501e-01, 5.7921e-02, 4.6345e-01, 4.1097e-01, 3.2102e-01, 6.9696e-02, -1.4442e-01, -2.3706e-01, -1.1486e-01, 2.6529e-01, -1.1724e-01, -1.5342e-01, 3.8587e-02, -3.7428e-01, 8.4994e-02, -2.8722e-01, -8.7962e-02, -2.3878e-01, -1.3052e-01, -4.7879e-01, 9.0405e-02, -3.9515e-01, -1.6687e-01, -4.2679e-01, 1.9127e-02, -1.7729e-01, -6.1946e-02, 6.4120e-03, 1.2182e-01, -1.6453e-01, -3.8485e-01, -2.5691e-01, -4.6639e-01, 4.0793e-02, -3.4684e-01, 1.2126e-01, -3.4125e-01, -2.7934e-01, 4.7662e-01, -1.3444e-01, 3.3714e-02, -4.6432e-01, -4.9636e-01, -1.5673e-01, -2.3140e-01, -4.6284e-01, 1.1926e-01, -1.0567e-01, -3.9161e-01, -3.6246e-01, -3.6285e-01, 2.7917e-01, -4.2190e-01, 3.5815e-01, -1.3636e-01, 2.5614e-01, 4.4768e-01, 1.7985e-01, 2.0618e-01, -2.4978e-01, 2.0866e-01, 1.5377e-01, -1.5427e-01, 1.5791e-01, -3.7525e-01, -1.1543e-01, -3.5070e-01, -4.6737e-01, 3.3200e-01, 4.4412e-01, 3.3342e-01, -2.8459e-01, 3.9553e-01, 6.4688e-02, -2.9085e-03, -4.3364e-01, -3.4963e-01, 1.2520e-01, 4.3094e-01, 3.0056e-01, -4.8826e-01, 3.0677e-01, -1.0584e-01, -4.9895e-01, -2.2566e-01, -1.3947e-01, 4.2535e-01, -2.1361e-01, 1.4325e-02, 3.1503e-01, -2.5911e-01, 1.3588e-01, -2.7181e-01, -9.1229e-02, -3.8378e-01, -4.9888e-01, 1.1255e-01, 4.6404e-01, 4.7393e-01, -4.6205e-01, -3.5867e-01, 2.4632e-01, 4.0868e-01, -2.8216e-01, 2.1556e-01, 1.4775e-01, -3.2551e-01, 4.8231e-01, -1.4681e-01, -2.5381e-01, -4.7905e-01, 1.4209e-01, -2.9922e-01, 2.5331e-01, 1.4627e-01, -2.0744e-01, 3.1157e-01, -2.3434e-01, -3.9565e-01, 1.8040e-01, -1.6962e-02, 3.8975e-01, -4.2561e-01], device='cuda:0', requires_grad=True), None, None, None, True Exception in thread Thread-4 (results_loop): Traceback (most recent call last): File "/root/miniconda3/envs/umamba/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/root/miniconda3/envs/umamba/lib/python3.10/threading.py", line 953, in run self._target(*self._args, **self._kwargs) File "/root/miniconda3/envs/umamba/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 125, in results_loop raise e File "/root/miniconda3/envs/umamba/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 103, in results_loop raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the " RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message 请问您训练的时候有出现这个问题吗,如果出现的话可否请教您如何解决的这个问题

innocence0206 commented 6 months ago

应该是causal_conv1d版本的问题,你可以确认安装的是否为1.1.0版本,如果还是不行,就去掉causal_conv1d_cuda.causal_conv1d_fwd( )这里面的倒数第二个参数None

sxqhyc commented 6 months ago

感谢您的回复,训练已经没有问题了,但是测试出了问题,想请问您测试时使用的命令行是什么

innocence0206 commented 6 months ago

感谢您的回复,训练已经没有问题了,但是测试出了问题,想请问您测试时使用的命令行是什么

和官方给的一样

sxqhyc commented 6 months ago

对,就是命令行中的INPUT_FOLDER,想请教您使用的路径是什么

Sophia710 commented 4 months ago

我跑 UMambaBot abdomen CT 3D 大约115s一个epoch

你好,请问你能跑UMambaEnc abdomen CT 3D吗,这个对显卡有什么要求吗,我使用24G的NVIDIA Geforce RTX 4090报错CUDA out of memory

innocence0206 commented 4 months ago

我跑 UMambaBot abdomen CT 3D 大约115s一个epoch

你好,请问你能跑UMambaEnc abdomen CT 3D吗,这个对显卡有什么要求吗,我使用24G的NVIDIA Geforce RTX 4090报错CUDA out of memory

能跑Enc CT 3D,需要占用近24G的显存

Sophia710 commented 4 months ago

我跑 UMambaBot abdomen CT 3D 大约115s一个epoch

你好,请问你能跑UMambaEnc abdomen CT 3D吗,这个对显卡有什么要求吗,我使用24G的NVIDIA Geforce RTX 4090报错CUDA out of memory

能跑Enc CT 3D,需要占用近24G的显存

感谢告知