AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.65k stars 7.96k forks source link

an illegal memory access was encountered after the last commit #4655

Open dkubatin opened 4 years ago

dkubatin commented 4 years ago

@AlexeyAB, hi! After the last repository update during training, the following error appears:

CUDA Error Prev: an illegal memory access was encountered CUDA Error Prev: an illegal memory access was encountered: File exists darknet: ./src/utils.c:297: error: Assertion 0 failed.

Makefile:

GPU=1 CUDNN=1 CUDNN_HALF=0 OPENCV=1 AVX=0 OPENMP=0 LIBSO=1 ZED_CAMERA=0 ARCH= -gencode arch=compute_75,code=[sm_75,compute_75]

Videocard: RTX 2080Ti CUDA Version: 10.1 OpenCV version: 3.4.6 cuDNN: 7.6.0

An error appears when training yolov3-5l and yolov3. I did not check on other configs.

It is also noted that yolo-5l does not use the GPU memory at full power for this particular model. This problem is observed on several PCs.

AlexeyAB commented 4 years ago

Try to use new commit: https://github.com/AlexeyAB/darknet/commit/6878ecc2e2b26d383cf65811b9d9e17375ca14ed

It is also noted that yolo-5l does not use the GPU memory at full power for this particular model. This problem is observed on several PCs.

It doesn't matter.

dkubatin commented 4 years ago

Great, the problem is gone, thanks!

canyilmaz90 commented 4 years ago

Hi, @dkubatin and @AlexeyAB Now, It gives the error when it calculates the mAP during training. I'm training yolov3-tiny_3l. The output is like:

calculation mAP (mean average precision)... 4CUDA Error Prev: an illegal memory access was encountered CUDA Error Prev: an illegal memory access was encountered: Success darknet: ./src/utils.c:297: error: Assertion '0' failed. Aborted (core dumped)

AlexeyAB commented 4 years ago

@canyilmaz90

canyilmaz90 commented 4 years ago

@AlexeyAB , before your questions, I think the problem is that I was working on a remote connection. I used ssh -X user@ip. I had a previous version of the repo stored in my own pc and I moved it to the remote computer, but it gave the same error too. But, in both repos (newest and older) it worked on my own computer. In the remote computer, it works also until the map calculation. I think the problem may be about the ssh connection. If so, do you have any suggestions for it?

Can you calculate the mAP on the same dataset with command ./darknet detector map ... ?

  • Yes, I can. Attach your cfg-file in zip
  • I think it's not about the cfg-file because I didn't change much, only the classes, filter sizes before yolo layer and the number of iterations, etc. Do you use the latest version of Darknet?
  • Yes, I use the latest version. I cloned it yesterday. What params do you use in Makefile?
  • GPU=1, CUDNN=1, OPENCV=1, the rest is the same. nvcc --version & nvidia-smi scrn

Show first 10 lines when you run any darknet command

  • CUDA-version: 10000 (10010) Warning: CUDA-version is lower than Driver-version! , cuDNN: 7.6.4, GPU count: 4
    OpenCV version: 4.9.1 net.optimized_memory = 0 batch = 1, time_steps = 1, train = 0 layer filters size/strd(dil) input output 0 conv 16 3 x 3/ 1 448 x 448 x 3 -> 448 x 448 x 16 0.173 BF 1 max 2x 2/ 2 448 x 448 x 16 -> 224 x 224 x 16 0.003 BF 2 conv 32 3 x 3/ 1 224 x 224 x 16 -> 224 x 224 x 32 0.462 BF
AlexeyAB commented 4 years ago

OpenCV version: 4.9.1

This is very strange, since the latest OpenCV 4.2.0: https://opencv.org/releases/

I think the problem may be about the ssh connection. If so, do you have any suggestions for it?

I think you something doing wrong: incorrect paths, dataset, cfg-file, or didn't recompile darknet on new computer, ... Try to download darknet again

canyilmaz90 commented 4 years ago

This is very strange, since the latest OpenCV 4.2.0: https://opencv.org/releases/

Indeed! But only the darknet shows it as 4.9.1, and the training without map calculation goes very well, so I didn't mind it. btw, the real version of OpenCV is 3.4.8-pre

I think you something doing wrong: incorrect paths, dataset, cfg-file, or didn't recompile darknet on the new computer, ...

Actually, I cloned it via terminal yesterday and then make it. I mean I didn't copy it to the remote computer. Also recloned and re-make after getting this error. It works in fact until the first map calculation (in this case it's 2000th iteration which is my burn-in number).

AlexeyAB commented 4 years ago

But only the darknet shows it as 4.9.1, a

Can you show screenshot?

Also can you show screenshot of the error?

Also attach your cfg-file in zip

canyilmaz90 commented 4 years ago

Hi @AlexeyAB , sorry that I could not look at this issue for a while because I was very busy at work last week. Here is a screenshot from error: Screenshot_1

This screenshot from ./darknet detector map ...: Screenshot_map

And here is the .zip file of my cfg file: global.cfg.zip

canyilmaz90 commented 4 years ago

An addition! The problem is not the ssh connection. I did another remote connection to another computer and it worked well. Maybe I should rebuild cuda, cudnn, opencv on this pc?

AlexeyAB commented 4 years ago

@canyilmaz90

But only the darknet shows it as 4.9.1, a

Can you show screenshot?

Maybe I should rebuild cuda, cudnn, opencv on this pc?

Try to do this.

canyilmaz90 commented 4 years ago

Do you get this issue if you train by using only 1 GPU?

yes I get this error anytime I run any ./darknet command

Show screenshot that "darknet shows it as 4.9.1"

Screenshot_cv2

Show content of obj.data file

classes=1
train=/media/arge/4TB_64GVNY0/Plate/trainings/Global-v3tiny/train.list
valid=/media/arge/4TB_64GVNY0/Plate/trainings/Global-v3tiny/valid.list
names=/media/arge/4TB_64GVNY0/Plate/trainings/Global-v3tiny/global.labels
backup=/media/arge/4TB_64GVNY0/Plate/trainings/Global-v3tiny/weights

Did you change anything in the source code?

I changed in detector.c, multi gpu synchronization frequency from 4 iterations to 5 iterations: train_networks(nets, ngpus, train, 4) ==>> train_networks(nets, ngpus, train, 5) and also map calculation rate from 4 epochs to 1 epoch: int calc_map_for_each = 4 * train_images_num / (net.batch * net.subdivisions); ==>> int calc_map_for_each = train_images_num / (net.batch * net.subdivisions);

in utils.c, I uncommented this line: find_replace(output_path, "/images/", "/labels/", output_path);

but with the same changes, same configurations, and same dataset, the code works well on another pc.

Do you get this message if you use ./darknet detector map ...?

No. Actually, I don't get this message when training too. Only this time, I forget to add the validation list in the obj.data, but it gives the same error without this message when I train it with a validation list.

Comment these lines instead of setting 0 in cfg file

Ok, I'll do it in the next training.

Ok, I'll try to reinstall Cuda and OpenCV at an appropriate time. thanks a lot for your patience :)

AlexeyAB commented 4 years ago

@canyilmaz90 Also try to download the latest Darknet and try to train without your changes in source code, will be there this error?

canyilmaz90 commented 4 years ago

@AlexeyAB I think I found it! When I increase the number of subdivisions (or decrease the minibatch size), it worked. I think it's about the mini-batch size, however, there is quite enough space in gpu ram. To test it, I tried a classification training which requires much less gpu memory. When the mini-batch size = 64, while calculating top-k score during training, it also threw a similar, but different error: CUDA Error: an illegal memory access was encountered: Resource temporarily unavailable. But, both detector and classifier worked well with a smaller mini-batch size.

AlexeyAB commented 4 years ago

@canyilmaz90

It seems that cuDNN library may allocate some array (~100 MB) on GPU-0 even if you use GPU-1. So it’s better if 10% of the GPU-memory remains free.


Also very strange that this line shows OpenCV 4.9.1: https://github.com/AlexeyAB/darknet/blob/2a9fe045f3fd385ec61a38c8225945482d0ad7c7/src/image_opencv.cpp#L1338

Can you show screenshot of content of OpenCV version.hpp file, like this? https://github.com/opencv/opencv/blob/89d3f95a8eea50acbfb4b8db380d5a4dc8a98173/modules/core/include/opencv2/core/version.hpp#L8-L11


in utils.c, I uncommented this line: find_replace(output_path, "/images/", "/labels/", output_path);

Yes, either you can do this. Or you can just put txt-label-files to the /image/ directory.

canyilmaz90 commented 4 years ago

@AlexeyAB

* Do you use one GPU for training?

* Do you run several instances of Darknet on 1 PC?

* Do you run training Detector with random=1 in cfg-file?

It seems that cuDNN library may allocate some array (~100 MB) on GPU-0 even if you use GPU-1. So it’s better if 10% of the GPU-memory remains free.

Also very strange that this line shows OpenCV 4.9.1:

https://github.com/AlexeyAB/darknet/blob/2a9fe045f3fd385ec61a38c8225945482d0ad7c7/src/image_opencv.cpp#L1338

Can you show screenshot of content of OpenCV version.hpp file, like this? https://github.com/opencv/opencv/blob/89d3f95a8eea50acbfb4b8db380d5a4dc8a98173/modules/core/include/opencv2/core/version.hpp#L8-L11

* in `opencv/modules/core/include/opencv2/core/version.hpp`

* or in `opencv/build/include/opencv2/core/version.hpp`

* or in `opencv/bin/install/include/opencv2/core/version.hpp`

Here is version.hpp file in '/usr/include/opencv2/core/': version.hpp.txt

It's really strange, it shows

#define CV_VERSION_EPOCH    2
#define CV_VERSION_MAJOR    4
#define CV_VERSION_MINOR    9
#define CV_VERSION_REVISION 1

But, I have an idea about it. When I start working in this company, opencv version something like 2.4.9.1 was installed on this pc. Then I installed opencv 3.4.8. So maybe something remains from that time.

canyilmaz90 commented 4 years ago

@AlexeyAB I think I found it! When I increase the number of subdivisions (or decrease the minibatch size), it worked. I think it's about the mini-batch size, however, there is quite enough space in gpu ram. To test it, I tried a classification training which requires much less gpu memory. When the mini-batch size = 64, while calculating top-k score during training, it also threw a similar, but different error: CUDA Error: an illegal memory access was encountered: Resource temporarily unavailable. But, both detector and classifier worked well with a smaller mini-batch size.

@AlexeyAB Can you try training with -map flag and also big mini-batch size like 64,128 etc.?

canyilmaz90 commented 4 years ago

@AlexeyAB I've just noticed that there is another version.hpp in /bin/local/include/opencv2/core/ and it shows:

#define CV_VERSION_MAJOR    3
#define CV_VERSION_MINOR    4
#define CV_VERSION_REVISION 8
#define CV_VERSION_STATUS   "-pre"

version.hpp.txt

AlexeyAB commented 4 years ago

@canyilmaz90

It seems that Darknet uses old OpenCV 2.4.9. I Several different versions of OpenCV can interfere with each other if the wrong paths are set, for example, can use the hpp-file from 2.4.9 and the SO-library from the new 3.x If old OpenCV 2.4.9 isn't required, try to delete it and leave only one version 3.x

Can you try training with -map flag and also big mini-batch size like 64,128 etc.?

For some models - yes, I can. It depends on: GPU, model, random-param and network size. On Quadro RTX8000 you can train small model with mini-batch 1024 and -map flag.

JeremyKeusters commented 3 years ago

Hi all,

Just had the same error while training a yolov4-tiny-custom with the following .cfg values:

batch=64
subdivisions=2
width=800
height=640

When I changed the values to:

batch=64
subdivisions=4
width=800
height=640

the issue no longer appeared. The issue appears at the point where it calculates the mAP values while training (after 1000 iterations) and only appears when running with the -map flag. It really seems to be related to putting the subdivisions=2. I tried to calculate the mAP manually with ./darknet detector map of the weights at iteration 900 (as iteration 1000 is not saved yet at the point of crashing).


Error I got:

4CUDA Error: an illegal memory access was encountered: File exists darknet: ./src/utils.c:331: error: Assertion `0' failed.

Makefile

GPU=1
CUDNN=1
CUDNN_HALF=0
OPENCV=1
AVX=0
OPENMP=0
LIBSO=0
ZED_CAMERA=0
ZED_CAMERA_v2_8=0

USE_CPP=0
DEBUG=0

ARCH= -gencode arch=compute_60,code=sm_60 \
      -gencode arch=compute_35,code=sm_35 \
      -gencode arch=compute_50,code=[sm_50,compute_50] \
      -gencode arch=compute_52,code=[sm_52,compute_52] \
        -gencode arch=compute_61,code=[sm_61,compute_61]
Videocard: Tesla P100
CUDA Version: 10.1
OpenCV version: 2.4.9
cuDNN: 7.6.5
AlexeyAB commented 3 years ago

I think the error message is wrong, for some reason. The error message should be Out of memory. If you use -map or small subdivisions - then it requires more memory.

JeremyKeusters commented 3 years ago

Thanks for the quick reply @AlexeyAB . I'm currently running the training job with subdivions=4, once this is done, I'll reproduce the error again and give you a more complete error output. Memory should normally not be an issue, as there was only around 12 000MiB of the 16 280MiB GPU memory in-use during training.

JeremyKeusters commented 3 years ago

Hi @AlexeyAB ! I ran it again to get the full output. Here you go:

 (next mAP calculation at 1000 iterations) 
 1000: 0.211191, 0.166555 avg loss, 0.002610 rate, 1.027363 seconds, 64000 images, 5.460727 hours left

4CUDA Error: an illegal memory access was encountered: File exists
darknet: ./src/utils.c:331: error: Assertion `0' failed.

 calculation mAP (mean average precision)...
 Detection layer: 30 - type = 28 
 Detection layer: 37 - type = 28 
CUDA status Error: file: ./src/network_kernels.cu : () : line: 720 : build time: Jul  1 2021 - 11:00:33 

 CUDA Error: an illegal memory access was encountered

Seems like @stephanecharette has the same issue in #7850 .

stephanecharette commented 3 years ago

This issue has so many things dating from over a year ago that I didn't want to add to it. The issue I ran into is very specific and only started happening with a commit from a few days ago. Issue #7850 documents the exact commit where this problem started happening, but yes, it looks to be the same as what @JeremyKeusters reported, even down to the extra leading 4 in the error message: 4CUDA Error: an illegal memory access was encountered.

AlexeyAB commented 3 years ago

@stephanecharette Hi, I fixed this bug. Try the latest commit ( https://github.com/AlexeyAB/darknet/commit/9c9232d1c3f0f80e40bf347643a542903d6703ca and https://github.com/AlexeyAB/darknet/commit/b2cb64dffbcf706ac9f1d12d7fe699c40eacc40b )

JeremyKeusters commented 3 years ago

Hi @AlexeyAB , thanks for the fix. I will train again sometime this week with the latest commit to verify that the bug was fixed on my end too.

JeremyKeusters commented 3 years ago

Hi @AlexeyAB ! I'm on 2418fa7 and I still have this issue.. Error message is slightly different (note the line number):

(next mAP calculation at 1000 iterations) 
1000: 0.213937, 0.263404 avg loss, 0.002610 rate, 0.817044 seconds, 64000 images, 5.043064 hours left

4CUDA Error: an illegal memory access was encountered: File exists
darknet: ./src/utils.c:331: error: Assertion `0' failed.

 calculation mAP (mean average precision)...
 Detection layer: 30 - type = 28 
 Detection layer: 37 - type = 28 
CUDA status Error: file: ./src/network_kernels.cu : () : line: 735 : build time: Jul 12 2021 - 14:35:49 

 CUDA Error: an illegal memory access was encountered
AlexeyAB commented 3 years ago

@JeremyKeusters Hi,

JeremyKeusters commented 3 years ago

Hi @AlexeyAB ,

Here's the information you requested, let me know if you need any additional information. As I already said, when I set the subdivisions to 4, the issue disappears.


  • Did you recompile Darknet?

Yes. I did however make 2 changes to the existing code:

  1. Save the weights every 100 iterations by commenting these 3 lines: https://github.com/AlexeyAB/darknet/blob/d669680879f72e58a5bc4d8de98c2e3c0aab0b62/src/detector.c#L385-L387 in and by out-commenting this line and replacing i with iteration: https://github.com/AlexeyAB/darknet/blob/d669680879f72e58a5bc4d8de98c2e3c0aab0b62/src/detector.c#L384

  2. Do the validation every 100 iterations by commenting this line: https://github.com/AlexeyAB/darknet/blob/d669680879f72e58a5bc4d8de98c2e3c0aab0b62/src/detector.c#L302 and adding

    # Allow validation every 100 lines
    calc_map_for_each = 100;

  • Can you share cfg-file?

Sure: yolov4-tiny-issue-4655.cfg.zip


  • What command do you use?

./darknet detector train data/issue-4655.data cfg/yolov4-tiny-issue-4655.cfg yolov4-tiny.conv.29 -map -dont_show &> logs/output.log &


  • Can you show such lines?
CUDA-version: 10010 (10010), cuDNN: 7.6.5, GPU count: 1  
 OpenCV version: 4.9.1
 0 : compute_capability = 600, cudnn_half = 0, GPU: Tesla P100-PCIE-16GB 
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 2    640 x 640 x   3 ->  320 x 320 x  32 0.177 BF
   1 conv     64       3 x 3/ 2    320 x 320 x  32 ->  160 x 160 x  64 0.944 BF
   2 conv     64       3 x 3/ 1    160 x 160 x  64 ->  160 x 160 x  64 1.887 BF
   3 route  2                              1/2 ->  160 x 160 x  32 
   4 conv     32       3 x 3/ 1    160 x 160 x  32 ->  160 x 160 x  32 0.472 BF
   5 conv     32       3 x 3/ 1    160 x 160 x  32 ->  160 x 160 x  32 0.472 BF
   6 route  5 4                                ->  160 x 160 x  64 
   7 conv     64       1 x 1/ 1    160 x 160 x  64 ->  160 x 160 x  64 0.210 BF
   8 route  2 7                                ->  160 x 160 x 128 
   9 max                2x 2/ 2    160 x 160 x 128 ->   80 x  80 x 128 0.003 BF
  10 conv    128       3 x 3/ 1     80 x  80 x 128 ->   80 x  80 x 128 1.887 BF
  11 route  10                             1/2 ->   80 x  80 x  64 
  12 conv     64       3 x 3/ 1     80 x  80 x  64 ->   80 x  80 x  64 0.472 BF
  13 conv     64       3 x 3/ 1     80 x  80 x  64 ->   80 x  80 x  64 0.472 BF
  14 route  13 12                              ->   80 x  80 x 128 
  15 conv    128       1 x 1/ 1     80 x  80 x 128 ->   80 x  80 x 128 0.210 BF
  16 route  10 15                              ->   80 x  80 x 256 
  17 max                2x 2/ 2     80 x  80 x 256 ->   40 x  40 x 256 0.002 BF
  18 conv    256       3 x 3/ 1     40 x  40 x 256 ->   40 x  40 x 256 1.887 BF
  19 route  18                             1/2 ->   40 x  40 x 128 
  20 conv    128       3 x 3/ 1     40 x  40 x 128 ->   40 x  40 x 128 0.472 BF
  21 conv    128       3 x 3/ 1     40 x  40 x 128 ->   40 x  40 x 128 0.472 BF
  22 route  21 20                              ->   40 x  40 x 256 
  23 conv    256       1 x 1/ 1     40 x  40 x 256 ->   40 x  40 x 256 0.210 BF
  24 route  18 23                              ->   40 x  40 x 512 
  25 max                2x 2/ 2     40 x  40 x 512 ->   20 x  20 x 512 0.001 BF
  26 conv    512       3 x 3/ 1     20 x  20 x 512 ->   20 x  20 x 512 1.887 BF
  27 conv    256       1 x 1/ 1     20 x  20 x 512 ->   20 x  20 x 256 0.105 BF
  28 conv    512       3 x 3/ 1     20 x  20 x 256 ->   20 x  20 x 512 0.944 BF
  29 conv     45       1 x 1/ 1     20 x  20 x 512 ->   20 x  20 x  45 0.018 BF
  30 yolo
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.05
  31 route  27                                 ->   20 x  20 x 256 
  32 conv    128       1 x 1/ 1     20 x  20 x 256 ->   20 x  20 x 128 0.026 BF
  33 upsample                 2x    20 x  20 x 128 ->   40 x  40 x 128
  34 route  33 23                              ->   40 x  40 x 384 
  35 conv    256       3 x 3/ 1     40 x  40 x 384 ->   40 x  40 x 256 2.831 BF
  36 conv     45       1 x 1/ 1     40 x  40 x 256 ->   40 x  40 x  45 0.037 BF
  37 yolo
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.05
Total BFLOPS 16.098 
avg_outputs = 712105 
 Allocate additional workspace_size = 26.22 MB 
 0 : compute_capability = 600, cudnn_half = 0, GPU: Tesla P100-PCIE-16GB 
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 2    640 x 640 x   3 ->  320 x 320 x  32 0.177 BF
   1 conv     64       3 x 3/ 2    320 x 320 x  32 ->  160 x 160 x  64 0.944 BF
   2 conv     64       3 x 3/ 1    160 x 160 x  64 ->  160 x 160 x  64 1.887 BF
   3 route  2                              1/2 ->  160 x 160 x  32 
   4 conv     32       3 x 3/ 1    160 x 160 x  32 ->  160 x 160 x  32 0.472 BF
   5 conv     32       3 x 3/ 1    160 x 160 x  32 ->  160 x 160 x  32 0.472 BF
   6 route  5 4                                ->  160 x 160 x  64 
   7 conv     64       1 x 1/ 1    160 x 160 x  64 ->  160 x 160 x  64 0.210 BF
   8 route  2 7                                ->  160 x 160 x 128 
   9 max                2x 2/ 2    160 x 160 x 128 ->   80 x  80 x 128 0.003 BF
  10 conv    128       3 x 3/ 1     80 x  80 x 128 ->   80 x  80 x 128 1.887 BF
  11 route  10                             1/2 ->   80 x  80 x  64 
  12 conv     64       3 x 3/ 1     80 x  80 x  64 ->   80 x  80 x  64 0.472 BF
  13 conv     64       3 x 3/ 1     80 x  80 x  64 ->   80 x  80 x  64 0.472 BF
  14 route  13 12                              ->   80 x  80 x 128 
  15 conv    128       1 x 1/ 1     80 x  80 x 128 ->   80 x  80 x 128 0.210 BF
  16 route  10 15                              ->   80 x  80 x 256 
  17 max                2x 2/ 2     80 x  80 x 256 ->   40 x  40 x 256 0.002 BF
  18 conv    256       3 x 3/ 1     40 x  40 x 256 ->   40 x  40 x 256 1.887 BF
  19 route  18                             1/2 ->   40 x  40 x 128 
  20 conv    128       3 x 3/ 1     40 x  40 x 128 ->   40 x  40 x 128 0.472 BF
  21 conv    128       3 x 3/ 1     40 x  40 x 128 ->   40 x  40 x 128 0.472 BF
  22 route  21 20                              ->   40 x  40 x 256 
  23 conv    256       1 x 1/ 1     40 x  40 x 256 ->   40 x  40 x 256 0.210 BF
  24 route  18 23                              ->   40 x  40 x 512 
  25 max                2x 2/ 2     40 x  40 x 512 ->   20 x  20 x 512 0.001 BF
  26 conv    512       3 x 3/ 1     20 x  20 x 512 ->   20 x  20 x 512 1.887 BF
  27 conv    256       1 x 1/ 1     20 x  20 x 512 ->   20 x  20 x 256 0.105 BF
  28 conv    512       3 x 3/ 1     20 x  20 x 256 ->   20 x  20 x 512 0.944 BF
  29 conv     45       1 x 1/ 1     20 x  20 x 512 ->   20 x  20 x  45 0.018 BF
  30 yolo
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.05
  31 route  27                                 ->   20 x  20 x 256 
  32 conv    128       1 x 1/ 1     20 x  20 x 256 ->   20 x  20 x 128 0.026 BF
  33 upsample                 2x    20 x  20 x 128 ->   40 x  40 x 128
  34 route  33 23                              ->   40 x  40 x 384 
  35 conv    256       3 x 3/ 1     40 x  40 x 384 ->   40 x  40 x 256 2.831 BF
  36 conv     45       1 x 1/ 1     40 x  40 x 256 ->   40 x  40 x  45 0.037 BF
  37 yolo
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.05
Total BFLOPS 16.098 
avg_outputs = 712105 
 Allocate additional workspace_size = 606.13 MB 
Loading weights from yolov4-tiny.conv.29... Prepare additional network for mAP calculation...
net.optimized_memory = 0 
mini_batch = 1, batch = 2, time_steps = 1, train = 0 
Create CUDA-stream - 0 
 Create cudnn-handle 0 
nms_kind: greedynms (1), beta = 0.600000 
nms_kind: greedynms (1), beta = 0.600000 
yolov4-tiny-issue-4655
net.optimized_memory = 0 
mini_batch = 32, batch = 64, time_steps = 1, train = 1 
nms_kind: greedynms (1), beta = 0.600000 
nms_kind: greedynms (1), beta = 0.600000 
Done! Loaded 29 layers from weights-file 
 Create 6 permanent cpu-threads 
Gffsn commented 3 years ago

Hi @JeremyKeusters, I had the same problem when using my model for inference: `CUDA status Error: file: ./src/network_kernels.cu : () : line: 735 : build time: Aug 19 2021 - 09:48:00

CUDA Error: an illegal memory access was encountered /home/gise-2/anaconda3/envs/platformtest/bin/python: check_error: Unknown error 1513545619`

It turns out I was forcing the model to be loaded and used on a specific GPU with tf.device('/device:GPU:1'): I commented this line and now it's working as expected.

@AlexeyAB, weirdly enough the illegal memory access error was raised when I forced the model on a gpu. Even forcing it on GPU:0 raises the error while the model naturally loads and run on the first gpu. Any idea why?

AlexeyAB commented 3 years ago

@GeoffSion

with tf.device('/device:GPU:1'):

What framework do you use for YOLOv4, is it Darknet or TensorFlow?

Gffsn commented 3 years ago

@AlexeyAB Good point, I'm using TensorFlow I tried darknet.set_gpu(1) and it worked Thanks for your answer! I hope it will help others with this issue