when I train for some epock,raise error;RuntimeError: DataLoader worker (pid 102459) is killed by signal: Segmentation fault.

RangiLyu / nanodet

NanoDet-Plus⚡Super fast and lightweight anchor-free object detection model. 🔥Only 980 KB(int8) / 1.8MB (fp16) and run 97FPS on cellphone🔥

Apache License 2.0

5.78k stars 1.04k forks source link

when I train for some epock,raise error;RuntimeError: DataLoader worker (pid 102459) is killed by signal: Segmentation fault. #515

Open shihongji1993 opened 1 year ago

shihongji1993 commented 1 year ago

Screenshot from 2023-05-29 11-50-54

mahdizynali commented 1 year ago

It regularly happens when your gpu memory is full of data. decrease your batch size into you gpu memory space . for example batch sizes with 32 usually allocates about 7 to 8 gig of memory. in order to find out gpu memory size which is allocated during processing, try this command :

watch nvidia-smi

shihongji1993 commented 1 year ago

It regularly happens when your gpu memory is full of data. decrease your batch size into you gpu memory space . for example batch sizes with 32 usually allocates about 7 to 8 gig of memory. in order to find out gpu memory size which is allocated during processing, try this command :

watch nvidia-smi

I have set the batch_size=32,but it also have the problem; when I set the workers_per_gpu=0 or 1,the problem can be solved,but the train speed also will be slower,

mahdizynali commented 1 year ago

What is your gpu model ? Would you check the number of cuda cores ?

shihongji1993 commented 1 year ago

What is your gpu model ? Would you check the number of cuda cores ?

my GPU is 2080