CUDA Error: unspecified launch failure

Yang507 commented 6 years ago

When i built the libdarknet.so and test my model with the library, i run the code with multithread on single GPU by use the yolo object alone, but the program happened a problem:

`CUDA Error: unspecified launch failure
nv_main: ./src/cuda.c:36: check_error: Assertion `0' failed.
Aborted (core dumped)`

so i doubt if the darknet support the multithread with only GPU. i run on the nvidia jetson tx2.

dreambit commented 5 years ago

@AlexeyAB hi i have the same issue, i am running 3 instances of python http server with yolo dll, nginx load balancer and have unspecified launch failure

Any recommendations?

When i run single instance it works fine.

Thx.

AlexeyAB commented 5 years ago

@dreambit Hi,

Do you get this error immediately or after several detections?
What CUDA, cuDNN and GPU do you use?
Can you run successfully several instances of https://github.com/AlexeyAB/darknet/blob/master/build/darknet/x64/darknet_video.py simultaneously (you should have test.mp4 file)?

Try to open \yolo_cpp_dll.sln in MSVS -> (right click on project) -> properties -> C/C++ -> Preprocessor -> Preprocessor Definitions, and change here: NDEBUG; to DEBUG;

Recompile yolo_cpp_dll.dll and run again. Then show me screenshot of the full error.

dreambit commented 5 years ago

@AlexeyAB

What CUDA, cuDNN and GPU do you use?

GTX 1080 TI 11gb cuda_10.0.130_411.31_win10 cudnn-10.0-windows10-x64-v7.4.2.24 opencv-3.4.0-vc14_vc15

values

Do you get this error immediately or after several detections?

Sometimes right after weights are loaded For darknet_video.py, 3 instances console_error When i run 2 instances it works

Another issue is that then i use darknet.py or darknet_video.py it is very cpu intensive, intel i5 cpu_usage For two darknet_video instances it uses > 90% cpu and less than 10% gpu

When i run darknet detector test, detection time is ~35ms, while with python >90ms.

I also noticed that in darknet.py is used predict_image = lib.network_predict_image while dll api is

struct bbox_t {
    unsigned int x, y, w, h;    // (x,y) - top-left corner, (w, h) - width & height of bounded box
    float prob;                    // confidence - probability that the object was found correctly
    unsigned int obj_id;        // class of object - from range [0, classes-1]
    unsigned int track_id;        // tracking id for video (0 - untracked, 1 - inf - tracked object)
    unsigned int frames_counter;// counter of frames on which the object was detected
};

class Detector {
public:
        Detector(std::string cfg_filename, std::string weight_filename, int gpu_id = 0);
        ~Detector();

        std::vector<bbox_t> detect(std::string image_filename, float thresh = 0.2, bool use_mean = false);
        std::vector<bbox_t> detect(image_t img, float thresh = 0.2, bool use_mean = false);
        static image_t load_image(std::string image_filename);
        static void free_image(image_t m);

#ifdef OPENCV
        std::vector<bbox_t> detect(cv::Mat mat, float thresh = 0.2, bool use_mean = false);
    std::shared_ptr<image_t> mat_to_image_resize(cv::Mat mat) const;
#endif
};

When i set DEBUG it become extra slow

With NDEBUG

there is not error when DEBUG is set

dreambit commented 5 years ago

I cant reproduce error when DEBUG is set

AlexeyAB commented 5 years ago

@dreambit

What versions of Darknet do you use? Try to use the latest version of this repository.

I can't reproduce this bug even if there is no DEBUG, I waited a few minutes:

python

Do you get this error immediately or after several detections?

Sometimes right after weights are loaded For darknet_video.py, 3 instances When i run 2 instances it works

Did you run darknet_video.py with yolo_cpp_dll.dll compiled with DEBUG? The error message looks like it was compiled without DEBUG definition.

I also noticed that in darknet.py is used predict_image = lib.network_predict_image while dll api is

I fixed Readme - there are 2 APIs - C API and C++ API: https://github.com/AlexeyAB/darknet#how-to-use-yolo-as-dll-and-so-libraries

dreambit commented 5 years ago

@AlexeyAB Thanks for your time, i cloned to new folder, darknet_video works fine with 3 instances, i dont know why, i will also test darknet.py.

Could you explain cpu load? Cpu is used 100% while gpu 20-30%? i5-2300, is possible to take the load off of CPU? looks like cpu is bottleneck here.

Thx

dreambit commented 5 years ago

@AlexeyAB Thanks, i am not sure but i think this error occurs then network size is large, 736x736 in my case and instances count 3. Thanks for your help :)

AlexeyAB commented 5 years ago

@dreambit

Could you explain cpu load? Cpu is used 100% while gpu 20-30%? i5-2300, is possible to take the load off of CPU? looks like cpu is bottleneck here.

This is an issue of Python example.

Thanks, i am not sure but i think this error occurs then network size is large, 736x736 in my case and instances count 3. Thanks for your help :)

May be just there is no enough GPU-RAM?

dreambit commented 5 years ago

@AlexeyAB

This is an issue of Python example.

I am not sure, i made lots of prints of exec time and the most resource-intensive part is predict_image(net, im), which is just lib.network_predict_image call, maybe python problem with dll

May be just there is no enough GPU-RAM?

Maybe , i thought about that, but in this case the error unspecified launch failure is misleading, because usually when there is not enough memeory - out of memory is thrown

AlexeyAB commented 5 years ago

@dreambit

Try to catch this bug with DEBUG definition.

Thanks, i am not sure but i think this error occurs then network size is large, 736x736 in my case and instances count 3. Thanks for your help :)

There shouldn't be enought GPU-RAM to run 3 instances of yolov3.cfg with width=736 height=736 batch=1, because 1 instance occupies ~4.5 GB GPU-RAM even if I run ./darknet detector test.... So 3 instances require ~13.5 GB GPU-RAM, that is more than 11 GB on GTX 1080 TI.

AlexeyAB / darknet

CUDA Error: unspecified launch failure #1652