WangYueFt / dgcnn

MIT License

1.64k stars 422 forks source link

"The number of GPUs to use" in sem_seg with train.py #21

Closed suan0365006 closed 5 years ago

suan0365006 commented 5 years ago

Hello, Thank you for sharing this code, it's amazing! Sorry, I have some question about train.py in sem_seg folder, When I run "sh +x train_job.sh" , cmd show this code: "Traceback (most recent call last): File "train.py", line 289, in train() File "train.py", line 238, in train train_one_epoch(sess, ops, train_writer) File "train.py", line 271, in train_one_epoch ops['pointclouds_phs'][1]: current_data[start_idx_1:end_idx_1, :, :], IndexError: list index out of range"

I check train.py parameters, and find a probably reason for GPU use number: parser.add_argument('--num_gpu', type=int, default=1, help='the number of GPUs to use [default: 2]') I just one NVIDIA 1050Ti, so I change default=2 to 1,is that mean I just buy more graphics card to fix this question? THANKS a lot!

syb7573330 commented 5 years ago

Hi,

Thanks for following our work. Yes, for this task, 1 GPU (12 GB memory of Nvidia Titan) is not enough to hold the data of predefined batch size, so we have to use 2 to train the model. But one thing you can try with a single GPU is to split the batch data into a few more smaller "mini_batches", and accumulate gradients across those mini_batches before update model trainable variables. This will achieve the same result theoretically. Good luck!

suan0365006 commented 5 years ago

Thank you reply this answer Best wish!

Yongbin Sun notifications@github.com 於 2019年6月14日週五下午9:31 寫道：

Hi,

Thanks for following our work. Yes, for this task, 1 GPU (12 GB memory of Nvidia Titan) is not enough to hold the data of predefined batch size, so we have to use 2 to train the model. But one thing you can try with a single GPU is to split the batch data into a few more smaller "mini_batches", and accumulate gradients across those mini_batches before update model trainable variables. This will achieve the same result theoretically. Good luck!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/WangYueFt/dgcnn/issues/21?email_source=notifications&email_token=AMIC2SNWCTHWNWIPS7AC4VTP2OMRJA5CNFSM4HYHLYRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXWZOKI#issuecomment-502109993, or mute the thread https://github.com/notifications/unsubscribe-auth/AMIC2SPN6M5AEZUTCBCMWLDP2OMRJANCNFSM4HYHLYRA .

longmalongma commented 5 years ago

Thanks for following our work. Yes, for this task, 1 GPU (12 GB memory of Nvidia Titan) is not enough to hold the data of predefined batch size, so we have to use 2 to train the model. But one thing you can try with a single GPU is to split the batch data into a few more smaller "mini_batches", and accumulate gradients across those mini_batches before update model trainable variables. This will achieve the same result theoretically. Good luck!

Hi,

Thanks for following our work. Yes, for this task, 1 GPU (12 GB memory of Nvidia Titan) is not enough to hold the data of predefined batch size, so we have to use 2 to train the model. But one thing you can try with a single GPU is to split the batch data into a few more smaller "mini_batches", and accumulate gradients across those mini_batches before update model trainable variables. This will achieve the same result theoretically. Good luck!

how to split the batch data into a few more smaller "mini_batches"?

longmalongma commented 5 years ago

Thank you reply this answer Best wish! Yongbin Sun notifications@github.com 於 2019年6月14日週五下午9:31 寫道： … Hi, Thanks for following our work. Yes, for this task, 1 GPU (12 GB memory of Nvidia Titan) is not enough to hold the data of predefined batch size, so we have to use 2 to train the model. But one thing you can try with a single GPU is to split the batch data into a few more smaller "mini_batches", and accumulate gradients across those mini_batches before update model trainable variables. This will achieve the same result theoretically. Good luck! — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#21?email_source=notifications&email_token=AMIC2SNWCTHWNWIPS7AC4VTP2OMRJA5CNFSM4HYHLYRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXWZOKI#issuecomment-502109993>, or mute the thread https://github.com/notifications/unsubscribe-auth/AMIC2SPN6M5AEZUTCBCMWLDP2OMRJANCNFSM4HYHLYRA .

hello,I also meet same problem.how to split the batch data into a few more smaller "mini_batches"?

suan0365006 commented 5 years ago

Thank you reply this answer Best wish! Yongbin Sun notifications@github.com 於 2019年6月14日週五下午9:31 寫道： … Hi, Thanks for following our work. Yes, for this task, 1 GPU (12 GB memory of Nvidia Titan) is not enough to hold the data of predefined batch size, so we have to use 2 to train the model. But one thing you can try with a single GPU is to split the batch data into a few more smaller "mini_batches", and accumulate gradients across those mini_batches before update model trainable variables. This will achieve the same result theoretically. Good luck! — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#21?email_source=notifications&email_token=AMIC2SNWCTHWNWIPS7AC4VTP2OMRJA5CNFSM4HYHLYRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXWZOKI#issuecomment-502109993>, or mute the thread https://github.com/notifications/unsubscribe-auth/AMIC2SPN6M5AEZUTCBCMWLDP2OMRJANCNFSM4HYHLYRA .

hello,I also meet same problem.how to split the batch data into a few more smaller "mini_batches"?

Hello, Into sem_seg/train.py You will find batch_size in line 24, and change for your require size. Try it.

longmalongma commented 5 years ago

Thank you reply this answer Best wish! Yongbin Sun notifications@github.com 於 2019年6月14日週五下午9:31 寫道： … Hi, Thanks for following our work. Yes, for this task, 1 GPU (12 GB memory of Nvidia Titan) is not enough to hold the data of predefined batch size, so we have to use 2 to train the model. But one thing you can try with a single GPU is to split the batch data into a few more smaller "mini_batches", and accumulate gradients across those mini_batches before update model trainable variables. This will achieve the same result theoretically. Good luck! — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#21?email_source=notifications&email_token=AMIC2SNWCTHWNWIPS7AC4VTP2OMRJA5CNFSM4HYHLYRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXWZOKI#issuecomment-502109993>, or mute the thread https://github.com/notifications/unsubscribe-auth/AMIC2SPN6M5AEZUTCBCMWLDP2OMRJANCNFSM4HYHLYRA .

hello,I also meet same problem.how to split the batch data into a few more smaller "mini_batches"?

Hello, Into sem_seg/train.py You will find batch_size in line 24, and change for your require size. Try it.

Thanks for your reply,I know it,but i set batch_size as 1,it can not work,my gpu is rtx2080(8g)i meet this problem: InternalError (see above for traceback): Blas xGEMM launch failed : a.shape=[1,4096,3], b.shape=[1,3,4096], m=4096, n=4096, k=3 [[Node: MatMul = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](ExpandDims_1, transpose)]] [[Node: adj_conv2/bn/cond/add_1/_223 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_525_adj_conv2/bn/cond/add_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

do you know ｃause for this problem and how to solve it?looking forward to you.

suan0365006 commented 5 years ago

Thank you reply this answer Best wish! Yongbin Sun notifications@github.com 於 2019年6月14日週五下午9:31 寫道： … Hi, Thanks for following our work. Yes, for this task, 1 GPU (12 GB memory of Nvidia Titan) is not enough to hold the data of predefined batch size, so we have to use 2 to train the model. But one thing you can try with a single GPU is to split the batch data into a few more smaller "mini_batches", and accumulate gradients across those mini_batches before update model trainable variables. This will achieve the same result theoretically. Good luck! — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#21?email_source=notifications&email_token=AMIC2SNWCTHWNWIPS7AC4VTP2OMRJA5CNFSM4HYHLYRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXWZOKI#issuecomment-502109993>, or mute the thread https://github.com/notifications/unsubscribe-auth/AMIC2SPN6M5AEZUTCBCMWLDP2OMRJANCNFSM4HYHLYRA .

hello,I also meet same problem.how to split the batch data into a few more smaller "mini_batches"?

Hello, Into sem_seg/train.py You will find batch_size in line 24, and change for your require size. Try it.

Thanks for your reply,I know it,but i set batch_size as 1,it can not work,my gpu is rtx2080(8g)i meet this problem: InternalError (see above for traceback): Blas xGEMM launch failed : a.shape=[1,4096,3], b.shape=[1,3,4096], m=4096, n=4096, k=3 [[Node: MatMul = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](ExpandDims_1, transpose)]] [[Node: adj_conv2/bn/cond/add_1/_223 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_525_adj_conv2/bn/cond/add_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

do you know ｃause for this problem and how to solve it?looking forward to you.

Okay, it look like something wrong with gpu. (or CUDA , Tensorflow, Keras) Maybe, you can try to change "Create a session" code to fix this problem, My code is:

Create a session

#config = tf.ConfigProto()
#config.gpu_options.allow_growth = True
#config.allow_soft_placement = True
#sess = tf.Session(config=config)
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.95 
config.allow_soft_placement = True
sess = tf.Session(config=config)

Hope can solve your problem.

longmalongma commented 5 years ago

Thank you reply this answer Best wish! Yongbin Sun notifications@github.com 於 2019年6月14日週五下午9:31 寫道： … Hi, Thanks for following our work. Yes, for this task, 1 GPU (12 GB memory of Nvidia Titan) is not enough to hold the data of predefined batch size, so we have to use 2 to train the model. But one thing you can try with a single GPU is to split the batch data into a few more smaller "mini_batches", and accumulate gradients across those mini_batches before update model trainable variables. This will achieve the same result theoretically. Good luck! — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#21?email_source=notifications&email_token=AMIC2SNWCTHWNWIPS7AC4VTP2OMRJA5CNFSM4HYHLYRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXWZOKI#issuecomment-502109993>, or mute the thread https://github.com/notifications/unsubscribe-auth/AMIC2SPN6M5AEZUTCBCMWLDP2OMRJANCNFSM4HYHLYRA .

hello,I also meet same problem.how to split the batch data into a few more smaller "mini_batches"?

Hello, Into sem_seg/train.py You will find batch_size in line 24, and change for your require size. Try it.

Thanks for your reply,I know it,but i set batch_size as 1,it can not work,my gpu is rtx2080(8g)i meet this problem: InternalError (see above for traceback): Blas xGEMM launch failed : a.shape=[1,4096,3], b.shape=[1,3,4096], m=4096, n=4096, k=3 [[Node: MatMul = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](ExpandDims_1, transpose)]] [[Node: adj_conv2/bn/cond/add_1/_223 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_525_adj_conv2/bn/cond/add_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]] do you know ｃause for this problem and how to solve it?looking forward to you.

Okay, it look like something wrong with gpu. (or CUDA , Tensorflow, Keras) Maybe, you can try to change "Create a session" code to fix this problem, My code is:

Create a session

config = tf.ConfigProto()

config.gpu_options.allow_growth = True

config.allow_soft_placement = True

sess = tf.Session(config=config)

config = tf.ConfigProto() config.gpu_options.per_process_gpu_memory_fraction = 0.95 config.allow_soft_placement = True sess = tf.Session(config=config)

Hope can solve your problem.

Thank you for your reply,after I change "Create a session" code according to your advice,I meet same problem, nternalError (see above for traceback): Blas xGEMM launch failed : a.shape=[1,4096,3], b.shape=[1,3,4096], m=4096, n=4096, k=3 Do you have any other ideas to solve this problem?my CUDA , Tensorflow is ok,because I successfully run your code about classification on modelnet40. when i run your code,memory is that: totalMemory: 7.76GiB freeMemory: 7.13GiB

suan0365006 commented 5 years ago

謝謝您的答复！最好的祝愿！Sun Yongbin notifications@github.com於2019年6月14日星期五下午9:31應力： … 嗨，謝謝您關注我們的工作。是的，對於此任務，僅1個GPU（Nvidia Titan的12 GB內存）不足以容納預定義的批處理大小的數據，因此我們必須使用2個訓練模型。但是，使用單個GPU可以嘗試的一件事是將批處理數據拆分為幾個更小的“ mini_batches”，並在更新模型可訓練變量之前在這些mini_batches上累積梯度。理論上，這將獲得相同的結果。祝好運！—您收到此消息是因為您創建了線程。直接回复此電子郵件，在GitHub < ＃21上查看？email_source =通知＆email_token = AMIC2SNWCTHWNWIPS7AC4VTP2OMRJA5CNFSM4HYHLYRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXWZOKI＃issuecomment-502109993>，或靜音螺紋https://github.com/notifications/unsubscribe-auth/AMIC2SPN6M5AEZUTCBCMWLDP2OMRJANCNFSM4HYHLYRA。

您好，我也遇到同樣的問題。如何將批處理數據拆分為幾個較小的“ mini_batches”？

您好，進入sem_seg / train.py 您將在第24行中找到batch_size，並更改為所需的大小。試試吧。

感謝您的答复，我知道，但是我將batch_size設置為1，它無法工作，我的GPU是rtx2080（8g），我遇到此問題： InternalError（請參見上文的回溯）：Blas xGEMM啟動失敗：a.shape = [1,4096,3]，b.shape = [1,3,4096]，m = 4096，n = 4096，k = 3 [[節點：MatMul = BatchMatMul [T = DT_FLOAT，adj_x = false，adj_y = false ，_device =“ / job：localhost / replica：0 / task：0 / device：GPU：0”]（ExpandDims_1，transpose）]] [[節點：adj_conv2 / bn / cond / add_1 / _223 = _Recvclient_terminated = false，recv_device =“ /作業：本地主機/副本：0 /任務：0 /設備：CPU：0”，send_device =“ /作業：本地主機/副本：0 /任務：0 /設備：GPU：0”，send_device_incarnation = 1，張量_名稱=“ edge_525_adj_conv2 / bn / cond / add_1”，tensor_type = DT_FLOAT，_device =“ / job：localhost /副本：0 / task：0 / device：CPU：0”]]] 您知道這個問題的原因和解決方法嗎？期待您。

好的，看來gpu出了點問題。（或CUDA，Tensorflow，Keras）也許，您可以嘗試更改“創建會話”代碼來解決此問題，我的代碼是：

建立會議

config = tf.ConfigProto（）

＃config.gpu_options.allow_growth = True ＃config.allow_soft_placement = True

sess = tf.Session（config = config）

config = tf.ConfigProto（） config.gpu_options.per_process_gpu_memory_fraction = 0.95 config.allow_soft_placement = True sess = tf.Session（config = config）希望可以解決您的問題。

謝謝您的答复，在根據您的建議更改“創建會話”代碼後，我遇到相同的問題 nternalError（請參見上文的回溯）：Blas xGEMM啟動失敗：a.shape = [1,4096,3]， b.shape = [1,3,4096]，m = 4096，n = 4096，k = 3 還有其他解決方法嗎？我的CUDA，Tensorflow沒問題，因為我成功地在modelnet40。當我運行您的代碼時，內存是：totalMemory：7.76GiB freeMemory：7.13GiB

Hi, sorry, this problem seem can't be solved easily, I suggest to collect this problem for some detail, and republish this issure to author.

In train.py, I change some parameter , I just one a 1050ti(4G) card my code(this code is base on DGCNN author code to regulate it)：

parser = argparse.ArgumentParser() parser.add_argument('--num_gpu', type=int, default=2, help='the number of GPUs to use [default: 2]') parser.add_argument('--log_dir', default='log', help='Log dir [default: log]') parser.add_argument('--num_point', type=int, default=4096, help='Point number [default: 4096]') parser.add_argument('--max_epoch', type=int, default=200, help='Epoch to run [default: 50]') parser.add_argument('--batch_size', type=int, default=2, help='Batch Size during training for each GPU [default: 24]') parser.add_argument('--learning_rate', type=float, default=0.001, help='Initial learning rate [default: 0.001]') parser.add_argument('--momentum', type=float, default=0.9, help='Initial learning rate [default: 0.9]') parser.add_argument('--optimizer', default='adam', help='adam or momentum [default: adam]') parser.add_argument('--decay_step', type=int, default=300000, help='Decay step for lr decay [default: 300000]') parser.add_argument('--decay_rate', type=float, default=0.5, help='Decay rate for lr decay [default: 0.5]') parser.add_argument('--test_area', type=int, default=6, help='Which area to use for test, option: 1-6 [default: 6]') FLAGS = parser.parse_args()

It's worth mentioning that "the number of GPUs to use". Although I only have one 1050ti card, I have to set it to 2. It worked.

You can try this code, good luck.

longmalongma commented 5 years ago

謝謝您的答复！最好的祝愿！Sun Yongbin notifications@github.com於2019年6月14日星期五下午9:31應力： … 嗨，謝謝您關注我們的工作。是的，對於此任務，僅1個GPU（Nvidia Titan的12 GB內存）不足以容納預定義的批處理大小的數據，因此我們必須使用2個訓練模型。但是，使用單個GPU可以嘗試的一件事是將批處理數據拆分為幾個更小的“ mini_batches”，並在更新模型可訓練變量之前在這些mini_batches上累積梯度。理論上，這將獲得相同的結果。祝好運！—您收到此消息是因為您創建了線程。直接回复此電子郵件，在GitHub < ＃21上查看？email_source =通知＆email_token = AMIC2SNWCTHWNWIPS7AC4VTP2OMRJA5CNFSM4HYHLYRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXWZOKI＃issuecomment-502109993>，或靜音螺紋https://github.com/notifications/unsubscribe-auth/AMIC2SPN6M5AEZUTCBCMWLDP2OMRJANCNFSM4HYHLYRA。

您好，我也遇到同樣的問題。如何將批處理數據拆分為幾個較小的“ mini_batches”？

您好，進入sem_seg / train.py 您將在第24行中找到batch_size，並更改為所需的大小。試試吧。

感謝您的答复，我知道，但是我將batch_size設置為1，它無法工作，我的GPU是rtx2080（8g），我遇到此問題： InternalError（請參見上文的回溯）：Blas xGEMM啟動失敗：a.shape = [1,4096,3]，b.shape = [1,3,4096]，m = 4096，n = 4096，k = 3 [[節點：MatMul = BatchMatMul [T = DT_FLOAT，adj_x = false，adj_y = false ，_device =“ / job：localhost / replica：0 / task：0 / device：GPU：0”]（ExpandDims_1，transpose）]] [[節點：adj_conv2 / bn / cond / add_1 / _223 = _Recvclient_terminated = false，recv_device =“ /作業：本地主機/副本：0 /任務：0 /設備：CPU：0”，send_device =“ /作業：本地主機/副本：0 /任務：0 /設備：GPU：0”，send_device_incarnation = 1，張量_名稱=“ edge_525_adj_conv2 / bn / cond / add_1”，tensor_type = DT_FLOAT，_device =“ / job：localhost /副本：0 / task：0 / device：CPU：0”]]] 您知道這個問題的原因和解決方法嗎？期待您。

好的，看來gpu出了點問題。（或CUDA，Tensorflow，Keras）也許，您可以嘗試更改“創建會話”代碼來解決此問題，我的代碼是：

建立會議

config = tf.ConfigProto（）

＃config.gpu_options.allow_growth = True ＃config.allow_soft_placement = True

sess = tf.Session（config = config）

config = tf.ConfigProto（） config.gpu_options.per_process_gpu_memory_fraction = 0.95 config.allow_soft_placement = True sess = tf.Session（config = config）希望可以解決您的問題。

謝謝您的答复，在根據您的建議更改“創建會話”代碼後，我遇到相同的問題 nternalError（請參見上文的回溯）：Blas xGEMM啟動失敗：a.shape = [1,4096,3]， b.shape = [1,3,4096]，m = 4096，n = 4096，k = 3 還有其他解決方法嗎？我的CUDA，Tensorflow沒問題，因為我成功地在modelnet40。當我運行您的代碼時，內存是：totalMemory：7.76GiB freeMemory：7.13GiB

Hi, sorry, this problem seem can't be solved easily, I suggest to collect this problem for some detail, and republish this issure to author.

In train.py, I change some parameter , I just one a 1050ti(4G) card my code(this code is base on DGCNN author code to regulate it)：

parser = argparse.ArgumentParser() parser.add_argument('--num_gpu', type=int, default=2, help='the number of GPUs to use [default: 2]') parser.add_argument('--log_dir', default='log', help='Log dir [default: log]') parser.add_argument('--num_point', type=int, default=4096, help='Point number [default: 4096]') parser.add_argument('--max_epoch', type=int, default=200, help='Epoch to run [default: 50]') parser.add_argument('--batch_size', type=int, default=2, help='Batch Size during training for each GPU [default: 24]') parser.add_argument('--learning_rate', type=float, default=0.001, help='Initial learning rate [default: 0.001]') parser.add_argument('--momentum', type=float, default=0.9, help='Initial learning rate [default: 0.9]') parser.add_argument('--optimizer', default='adam', help='adam or momentum [default: adam]') parser.add_argument('--decay_step', type=int, default=300000, help='Decay step for lr decay [default: 300000]') parser.add_argument('--decay_rate', type=float, default=0.5, help='Decay rate for lr decay [default: 0.5]') parser.add_argument('--test_area', type=int, default=6, help='Which area to use for test, option: 1-6 [default: 6]') FLAGS = parser.parse_args()

It's worth mentioning that "the number of GPUs to use". Although I only have one 1050ti card, I have to set it to 2. It worked.

You can try this code, good luck.

Thanks for your reply,I will try to run codes again according to your advice.Can you give me your contact?

suan0365006 commented 5 years ago

謝謝您的回應！最好的祝愿！孫永斌notifications@github.com於2019年6月14日星期五下午9:31應力： … 嗨，謝謝您關注我們的工作。是的，關於此任務，僅1一個GPU（Nvidia Titan的12 GB內存）不足以容納預定義的批處理大小的數據，因此我們必須使用2個訓練模型。但是，使用GPU可以嘗試的一件事是將批處理數據替換為幾個更小的“ mini_batches”，並在更新模型可訓練變量之前在這些mini_batches上累積梯度。理論上，這將獲得相同的結果。祝好運！—您收到此消息是因為您創建了線程。直接回复此電子郵件，在GitHub的< ＃21上查看？email_source =通知＆email_token = AMIC2SNWCTHWNWIPS7AC4VTP2OMRJA5CNFSM4HYHLYRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXWZOKI＃issuecomment-502109993>，靜音或螺紋https://github.com/notifications/unsubscribe-auth/AMIC2SPN6M5AEZUTCBCMWLDP2OMRJANCNFSM4HYHLYRA。

您好，我也遇到同樣的問題。如何將批處理數據分割為幾個較小的“ mini_batches”？

您好，進入sem_seg / train.py 您將在第24行中找到batch_size，並更改為所需的大小。試試吧。

感謝您的答复，我知道，但是我將batch_size設置為1，它無法工作，我的GPU是rtx2080（8g），我遇到此問題： InternalError（請參見上文的回溯）：Blas xGEMM啟動失敗： a.shape = [1,4096,3]，b.shape = [1,3,4096]，m = 4096，n = 4096，k = 3 [[相鄰：MatMul = BatchMatMul [T = DT_FLOAT，adj_x = false ，adj_y = false，_device =“ /作業：本地主機/複製副本：0 /任務：0 /設備：GPU：0”]（ExpandDims_1，轉置）]]] [[中斷：adj_conv2 / bn / cond / add_1 / _223 = _Recvclient_terminated = false，recv_device =“ /作業：本地主機/副本：0 /任務：0 /設備：CPU：0”，send_device =“ /作業：本地主機/副本：0 /任務：0 /設備：GPU：0” ，send_device_incarnation = 1，張量_名稱=“edge_525_adj_conv2 / BN / COND / add_1”，tensor_type = DT_FLOAT，_device =“/作業：本地主機/副本：0 /任務：0 /裝置：CPU：0”]]] 您知道這個問題的原因和解決方法嗎？期待您。

（或CUDA，Tensorflow，Keras）也許，您可以嘗試更改“創建會話”代碼來解決此問題，我的代碼是：

建立會議

config = tf.ConfigProto（）

＃config.gpu_options.allow_growth = True ＃config.allow_soft_placement = True

sess = tf.Session（config = config）

config = tf.ConfigProto（） config.gpu_options.per_process_gpu_memory_fraction = 0.95 config.allowsoft = True sess = tf.Session（配置=配置）希望可以解決您的問題。

謝謝您的答复，在根據您的建議更改“創建會話”代碼後，我遇到相同的問題nternalError（請參見上文的回溯）：Blas xGEMM啟動失敗：a.shape = [1,4096,3] ，b.shape = [1,3,4096]，m = 4096，n = 4096，k = 3 還有其他解決方法嗎？我的CUDA，Tensorflow沒問題，因為我成功地在modelnet40。當我運行您的代碼時，內存是：totalMemory：7.76GiB freeMemory：7.13GiB

嗨，很抱歉，這個問題似乎無法輕鬆解決，我建議收集一些詳細信息，然後將此問題重新發布給作者。在train.py中，我更改一些參數，我的代碼只是一張1050ti（4G）卡（此代碼基於DGCNN作者代碼進行調節）： parser = argparse.ArgumentParser（） parser.add_argument（'-num_gpu'，type = int，default = 2，help ='要使用的GPU數量[默認：2]'） parser.add_argument（'-log_dir' ，default ='log'，help ='Log dir [default：log]'） parser.add_argument（'-num_point'，type = int，default = 4096，help ='Point number [default：4096]'）解析器.add_argument（'-max_epoch'，type = int，default = 200，help ='Epoch to run [default：50]'） parser.add_argument（'-batch_size'，type = int，default = 2，help = '每個GPU訓練期間的批處理大小[默認：24]'） parser.add_argument（'-learning_rate'，type = float，default = 0.001，help ='初始學習率[默認：0.001]'） parser.add_argument（ '--momentum'，類型=浮動，默認= 0.9，help ='初始學習率[默認：0.9]'） parser.add_argument（'-optimizer'，默認='adam'，help ='adam或動量[default：adam]'） parser.add_argument（'-decay_step'，type = int，default = 300000，help =' lr衰減的衰減步長[默認值：300000]'） parser.add_argument（'-decay_rate'，type = float，default = 0.5，help ='lr衰減的衰減率[默認值：0.5]'） parser.add_argument（' --test_area'，類型= int，默認= 6，幫助='用於測試的區域，選項：1-6 [默認：6]'） FLAGS = parser.parse_args（）值得一提的是“要使用的GPU數量”。儘管我只有一張1050ti卡，但我必須將其設置為2。您可以嘗試使用此代碼，祝您好運。

感謝您的回复，我將根據您的建議再次嘗試運行代碼。能否給我您的聯繫方式？

okay, E-mail：suan0365006@gmail.com

longmalongma commented 5 years ago

謝謝您的回應！最好的祝愿！孫永斌notifications@github.com於2019年6月14日星期五下午9:31應力： … 嗨，謝謝您關注我們的工作。是的，關於此任務，僅1一個GPU（Nvidia Titan的12 GB內存）不足以容納預定義的批處理大小的數據，因此我們必須使用2個訓練模型。但是，使用GPU可以嘗試的一件事是將批處理數據替換為幾個更小的“ mini_batches”，並在更新模型可訓練變量之前在這些mini_batches上累積梯度。理論上，這將獲得相同的結果。祝好運！—您收到此消息是因為您創建了線程。直接回复此電子郵件，在GitHub的< ＃21上查看？email_source =通知＆email_token = AMIC2SNWCTHWNWIPS7AC4VTP2OMRJA5CNFSM4HYHLYRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXWZOKI＃issuecomment-502109993>，靜音或螺紋https://github.com/notifications/unsubscribe-auth/AMIC2SPN6M5AEZUTCBCMWLDP2OMRJANCNFSM4HYHLYRA。

您好，我也遇到同樣的問題。如何將批處理數據分割為幾個較小的“ mini_batches”？

您好，進入sem_seg / train.py 您將在第24行中找到batch_size，並更改為所需的大小。試試吧。

感謝您的答复，我知道，但是我將batch_size設置為1，它無法工作，我的GPU是rtx2080（8g），我遇到此問題： InternalError（請參見上文的回溯）：Blas xGEMM啟動失敗： a.shape = [1,4096,3]，b.shape = [1,3,4096]，m = 4096，n = 4096，k = 3 [[相鄰：MatMul = BatchMatMul [T = DT_FLOAT，adj_x = false ，adj_y = false，_device =“ /作業：本地主機/複製副本：0 /任務：0 /設備：GPU：0”]（ExpandDims_1，轉置）]]] [[中斷：adj_conv2 / bn / cond / add_1 / _223 = _Recvclient_terminated = false，recv_device =“ /作業：本地主機/副本：0 /任務：0 /設備：CPU：0”，send_device =“ /作業：本地主機/副本：0 /任務：0 /設備：GPU：0” ，send_device_incarnation = 1，張量_名稱=“edge_525_adj_conv2 / BN / COND / add_1”，tensor_type = DT_FLOAT，_device =“/作業：本地主機/副本：0 /任務：0 /裝置：CPU：0”]]] 您知道這個問題的原因和解決方法嗎？期待您。

（或CUDA，Tensorflow，Keras）也許，您可以嘗試更改“創建會話”代碼來解決此問題，我的代碼是：

建立會議

config = tf.ConfigProto（）

＃config.gpu_options.allow_growth = True ＃config.allow_soft_placement = True

sess = tf.Session（config = config）

config = tf.ConfigProto（） config.gpu_options.per_process_gpu_memory_fraction = 0.95 config.allowsoft = True sess = tf.Session（配置=配置）希望可以解決您的問題。

謝謝您的答复，在根據您的建議更改“創建會話”代碼後，我遇到相同的問題nternalError（請參見上文的回溯）：Blas xGEMM啟動失敗：a.shape = [1,4096,3] ，b.shape = [1,3,4096]，m = 4096，n = 4096，k = 3 還有其他解決方法嗎？我的CUDA，Tensorflow沒問題，因為我成功地在modelnet40。當我運行您的代碼時，內存是：totalMemory：7.76GiB freeMemory：7.13GiB

嗨，很抱歉，這個問題似乎無法輕鬆解決，我建議收集一些詳細信息，然後將此問題重新發布給作者。在train.py中，我更改一些參數，我的代碼只是一張1050ti（4G）卡（此代碼基於DGCNN作者代碼進行調節）： parser = argparse.ArgumentParser（） parser.add_argument（'-num_gpu'，type = int，default = 2，help ='要使用的GPU數量[默認：2]'） parser.add_argument（'-log_dir' ，default ='log'，help ='Log dir [default：log]'） parser.add_argument（'-num_point'，type = int，default = 4096，help ='Point number [default：4096]'）解析器.add_argument（'-max_epoch'，type = int，default = 200，help ='Epoch to run [default：50]'） parser.add_argument（'-batch_size'，type = int，default = 2，help = '每個GPU訓練期間的批處理大小[默認：24]'） parser.add_argument（'-learning_rate'，type = float，default = 0.001，help ='初始學習率[默認：0.001]'） parser.add_argument（ '--momentum'，類型=浮動，默認= 0.9，help ='初始學習率[默認：0.9]'） parser.add_argument（'-optimizer'，默認='adam'，help ='adam或動量[default：adam]'） parser.add_argument（'-decay_step'，type = int，default = 300000，help =' lr衰減的衰減步長[默認值：300000]'） parser.add_argument（'-decay_rate'，type = float，default = 0.5，help ='lr衰減的衰減率[默認值：0.5]'） parser.add_argument（' --test_area'，類型= int，默認= 6，幫助='用於測試的區域，選項：1-6 [默認：6]'） FLAGS = parser.parse_args（）值得一提的是“要使用的GPU數量”。儘管我只有一張1050ti卡，但我必須將其設置為2。您可以嘗試使用此代碼，祝您好運。

感謝您的回复，我將根據您的建議再次嘗試運行代碼。能否給我您的聯繫方式？

okay, E-mail：suan0365006@gmail.com

Hi，Ｉ have been troubled by this problem many days. Can you send me your ｗhole train.py file?or your models have trained?