Open andreasmarxer opened 4 years ago
Check that you have these images.
Also try with -ext_output -dont_show
instead of -ext_output
Then only the cannot load image error appears:
So you don't have these images or don't have permissions.
I do have because when I compile it with OpenCV=0 it works perfectly fine.
The thing is that I want to predict and save all images in the validation.txt file. And actually, without OpenCV it predicts all, but always saves the images as predictions.jpg and therefore overwrites the previous image.
I guessed that with OpenCV enabled, I could see all these images from the validation while testing. Additional features as e.g. getting the loss curve during training would also be available with OpenCV.
Otherwise it seems that OpenCV can't load your images.
Yes, but what could be the reason for that?
And is there another way to predict all validation images and save them out of validation.txt without OpenCV?
I'm just looking for a way to predict a list of images and save them all with a with a different name e.g. prediciton_img1.jpg if the image name was img1 and this all without OpenCV. @AlexeyAB With the actual implementation, it always saves the image as predictions.jpg and overwrites them.
@andreasmarxer your opencv was built without jpg support. Strange but definitely possible.
It might be interesting for you to rebuild dependencies using vcpkg.
Install it and then launch ./vcpkg install darknet[full]
. It will install darknet with all of its dependencies, including CUDA and CUDNN. Please let me know if it doesn't work
Thank you for your answer! @cenit Okay, this might be an issue, I will check. The thing is that I'm working on a HPC cluster and I didn't install OpenCV by myself. I will check the permissions I have and try to install this package to build from. Keep you posted :)
you can always build packages in your personal folder, and vcpkg is made exactly for that. I remembered your HPC setup, that's why I was trying to suggest you this way :) (it's also possible that the hpc admin did really a bad mistake providing an opencv package without libjpeg support, so it might interest them too)
Okay, this sounds good!
I did the following:
1) clone the git repo and go into this folder
2) ./bootstrap-vcpkg.sh
3) ./vcpkg integrate install
4) run ./vcpkg install darknet[full]
still in the vcpkg folder
Then the output leads to an error:
The problem is that vcpkg is very bleeding edge.
The portfile for cuda (luckily it is very simple) expects cuda 10.1, while you have only 9.x
If you open it (ports/cuda/portfile.cmake) you should be able to tailor it to your version, allowing you to cheat. The subsequent problem unfortunately is CUDNN, which goes hand by hand with cuda. You should modify also that portfile accordingly. Then it should work. Please let me know if you encounter any other problem. In case of success, please share it the portfiles for cuda9+CUDNN, for future reference and people that might encounter your same problem
I do also have cuda 10.1 preinstalled, think it would make sense to first try to load this and run all steps again. But I'm not sure if I need to change the loaded cudnn version, actually it's still 7.0.0.
Edit: Still the same error occurs after loading cuda version 10.1.0.
Interesting. Let’s debug that, it might be easier.
Can you tell me what's the output of
echo $CUDA_PATH
of
echo $CUDA_TOOLKIT_ROOT_DIR
and of
echo $CUDA_BIN_PATH
?
I don't know why but there is no output when doing all of these commands. What am I doing wrong?
Nothing wrong, but unfortunately that's the reason why CUDA port file fails. Let me say that modules on your hpc machine have not been built correctly... Please set up manually CUDA_PATH pointing to the root folder of CUDA
export CUDA_PATH=/path/to/cuda/folder
if setup is ok, running this command
$CUDA_PATH/bin/nvcc --version
should be ok and should not give you a command not found. After that, retry installing CUDA & all other ports. To permanently fix the missing export variable, copy that line in your .basrhc file
But how can it be that this CUDA installation issue only pops up when using OpenCV? Otherwise, when OpenCV is disabled, the training works fine with GPU.
What do you mean with retry installing CUDA & all other ports?
The command ./vcpkg install darknet[full]
?
I was discussing vcpkg only in my last comment. It requires a proper cuda installation to work, and a proper cuda installation provides those symbols :)
What do you mean with those symbols? The variables $CUDA_PATH, $CUDA_TOOLKIT... etc. ?
Yes, please define them appropriately, using export like I was saying before, and then retry
I did again investigate in this issue also with the cluster support. Indeed it turned out that the OpenCV library was built without PEG Support. Now a new OpenCV version was built with JPEG support.
Unfortunately, also with the new version the issue stays and the images cannot be loaded when using OpenCV. The error for ./darknet detector train data/obj-120.data cfg/yolov3-obj.cfg backup/darknet53.conv.74 -ext_output -dont_show
is as follows:
I also did look again at the mentioned variables of @cenit. I defined the $CUDA_PATH according to the root path of CUDA. But it didn't change something at the error.
Do I also need to define further variables? The CUDA paths I have knowledge from on the cluster are the following:
Or may I also need to define the same for the OpenCV variables? These are the paths were the OpenCV is located on the cluster and I have knowledge:
Do you get this issue Cannot load image
if compile Darknet without OpenCV?
No, I only get it with OpenCV.
can you please share the logs about the initial cmake configuration? Please delete the build_ folder and then re-run the ./build.sh (if that is the way you are working), and copy all the messages that appear on the console here. It seems that CMake is still finding an OpenCV that is built without* jpeg support
I'm just looking for a way to predict a list of images and save them all with a with a different name e.g. prediciton_img1.jpg if the image name was img1 and this all without OpenCV. @AlexeyAB With the actual implementation, it always saves the image as predictions.jpg and overwrites them.
@andreasmarxer Have you found a solution? I would need it as well
I compiled the Makefile.txt with
make
successfully.The loaded modules on the HPC cluster are:
But in training afterwards, I have an error loading the images, which only occur when I do have OpenCV enabled (OpenCV=1). The training output looks as follows when using the command:
./darknet detector test data/obj.data cfg/yolov3-obj.cfg backup/yolov3-obj_4000.weights -ext_output < data/validation_window1000.txt > result_test.txt
Without OpenCV it works. What could be the issue here? @AlexeyAB @cenit