Xilinx / Vitis-AI-Tutorials

MIT License
358 stars 144 forks source link

Caffe model training +dog cat classification #53

Open code-locker opened 2 years ago

code-locker commented 2 years ago

Hi, I am performing training procedure for the caffe model i.e. 01-caffe_cats_vs_dogs. I am facing below issue during training.

I0210 09:24:31.278432 2794 caffe.cpp:247] Starting Optimization I0210 09:24:31.278439 2794 solver.cpp:341] Solving alexnetBNnoLRN m2 (as m3 but less DROP and less BN) I0210 09:24:31.278442 2794 solver.cpp:342] Learning Rate Policy: step I0210 09:24:31.279312 2794 solver.cpp:424] Iteration 0, Testing net (#0) I0210 09:24:32.102056 2794 solver.cpp:523] Test net output #0: accuracy = 0.5 I0210 09:24:32.102087 2794 solver.cpp:523] Test net output #1: loss = 0.693147 (* 1 = 0.693147 loss) I0210 09:24:32.102092 2794 solver.cpp:523] Test net output #2: top-1 = 0.5 F0210 09:24:32.151126 2794 math_functions.cu:27] Check failed: status == CUBLAS_STATUS_SUCCESS (13 vs. 0) CUBLAS_STATUS_EXECUTION_FAILED Check failure stack trace: @ 0x7f60598814dd google::LogMessage::Fail() @ 0x7f6059889071 google::LogMessage::SendToLog() @ 0x7f6059880ecd google::LogMessage::Flush() @ 0x7f605988276a google::LogMessageFatal::~LogMessageFatal() @ 0x7f605863c24a caffe::caffe_gpu_gemm<>() @ 0x7f60585e248c caffe::InnerProductLayer<>::Backward_gpu() @ 0x7f6058458be3 caffe::Net<>::BackwardFromTo() @ 0x7f6058458d3f caffe::Net<>::Backward() @ 0x7f60584bdc4c caffe::Solver<>::Step() @ 0x7f60584be791 caffe::Solver<>::Solve() @ 0x55d5cd3a35ce train() @ 0x55d5cd39ca59 main @ 0x7f6056c29bf7 __libc_start_main @ 0x55d5cd39d6a8 (unknown) Aborted (core dumped)

Elapsed time for Caffe training (s): 1077.31017

How can I solve this issue?

mhanuel26 commented 2 years ago

Hi @abhishek-ml-ai ,

I have found similar problem when training one of the models from the zoo based on caffe, look here

https://github.com/Xilinx/Vitis-AI/issues/691

What hardware and software do you have (GPU and Nvidia) ?

code-locker commented 2 years ago

Hi @mhanuel26 , I am having below configuration. image Thanks

bhargavin1872008 commented 1 year ago

when running the requirements.txt of keras-yolov3-modelset -i 'm getting error for coremltools.it is showing like "couldn't find a version that satisfies the requirement tensorflow<=1.14 and tensorflow >=1.5(from tfcoremltools -r requirements.txt).(from version :2.2.0,2.2..1, 2.2.2, ...2.7.0rc0,2.7.0.rc1............) like this .can someone help me regarding this. Also ,i have a doubt .can we use ubuntu 20.04 ,cuda 11.7 ,cudnn 8.4.0 for this project. or have to use ubuntu 18.04,cuda 10.0 only which only works.please help me regarding this,i have less time in my hand.