dbolya / yolact

A simple, fully convolutional model for real-time instance segmentation.
MIT License
5.01k stars 1.32k forks source link

How to use big batchsize? #242

Open lswzjuer opened 4 years ago

lswzjuer commented 4 years ago

hello, your job is very good ! Thanks for your code.

 I want to apply your code to the instance segmentation of a cell slice image, but my image is 300*300, and we very care about the inference time  so  i want to fine-tuning one model based on my cell datasets which image size is 300*300.
 the image look like:

image

 only one class (cell )   and a lot of small  instance need to be masked and detected. Do you have any suggestions for training models? Such as  how to set the size of the anchor and how to improve the mAP for small objective?

ISSURE:

I use two GPUs and set the batchsize as 64/32/16,  But after iterating for 1 epoch, the error of loss value nan will appear.  Why i must set the batch_size=8*nums_gpu ? How I improve the batchsize correctly (My GPU's global memory is up to 24G ).

Thanks

lswzjuer commented 4 years ago

the error message is /pytorch/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype , const Dtype , Dtype ) [with Dtype = float, Acctype = float]: block: [108,0,0], thread: [255,0,0] Assertion `input >= 0. && *input <= 1.` failed.

train: sudo CUDA_VISIBLE_DEVICES=0,1 python train.py --config=yolact_im300_config --batch_size=16

dbolya commented 4 years ago

Try the latest master branch. I just added a fix for the inf / nan loss explosion issue (see #222).

As for what modifications to make, how much fps do you need? Is the performance of resnet50 on 550x550 images acceptable? I ask this because the architecture itself is not very good at handling very small objects so it might be very beneficial to upscale the images to 550x550 and then classify them.

If you want to still use 300x300, I'd halve all of the anchor sizes in "pred_scales" (in yolact_base). Also if you want to train the model to use 300x300 images, set max_size to 300.

piseabhijeet commented 3 years ago

Hi,

for a batch size of 32, with 4x32GB Tesla V100 GPUs, i am getting this error:

RuntimeError: DataLoader worker (pid 1851) is killed by signal: Bus error.

It seems that only one GPU was utilized despite specifying all 4 GPUs for training

piseabhijeet commented 3 years ago

Also getting this warning:

Per-GPU batch size is less than the recommended limit for batch norm. Disabling batch norm.

The entire log: *_ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm). Traceback (most recent call last): File "/opt/conda/lib/python3.6/multiprocessing/queues.py", line 240, in _feed send_bytes(obj) File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes self._send_bytes(m[offset:offset + size]) File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 404, in _send_bytes self._send(header + buf) File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 368, in _send n = write(self._handle, buf) BrokenPipeError: [Errno 32] Broken pipe Traceback (most recent call last): File "train.py", line 504, in train() File "train.py", line 307, in train losses = net(datum) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 525, in call result = self.forward(input, kwargs) File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 148, in forward inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids) File "train.py", line 159, in scatter splits = prepare_data(inputs[0], devices, allocation=args.batch_alloc) File "train.py", line 412, in prepare_data images[cur_idx] = gradinator(images[cur_idx].to(device)) File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/signal_handling.py", line 63, in handler _error_if_any_workerfails() RuntimeError: DataLoader worker (pid 3091) is killed by signal: Bus error.