AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.76k stars 7.96k forks source link

Models trained in Linux run poorly on Windows and vice-versa. #8903

Open acidtonic opened 6 months ago

acidtonic commented 6 months ago

I am trying to figure out why models trained on the same codebase compiled in linux, run poorly when moved to windows and vice-versa.

I have tried training on Ubuntu 22.04 and moving to windows 11 to run the net and find performance is similar at high thresholds but if I lower the threshold the windows side will start showing hundreds of boxes all over the screen and the linux side does not.

Both compiled using the exact same github commit hash and using the same hardware/gpu/etc. Both sides have the same cuda sdk down to the minor number too. Tried various cards such a 2080ti, 3090ti, 4080, etc.

Is there any guidance for running models between operating systems? Do I need to adjust something to make this work smoothly?

Statgator2 commented 6 months ago

I've experienced this as well. I tend to have significantly better net performance in Linux than in Windows.

acidtonic commented 6 months ago

I have a feeling it's related to some minor ABI difference with the model but I see people sharing models with each other often so I'm somewhat confused what causes this or what I can do to fix/identify it.

Statgator2 commented 6 months ago

What I have noticed is that Windows will produce hundreds of boxes on the screen simply by lowering the confidence of the detections. However if you do the same in linux, Darknet will not produce the hundreds of boxes. It might be a couple of extra boxes at lower confidence but not typically hundreds of boxes not even at 1% confidence. Windows I feel does the hundreds of boxes thing even at 40% confidence. I find this behavior to be illogical.

Does anyone know what is causing this net performance difference between Windows and Linux?

stephanecharette commented 6 months ago

Is this also a problem with the newer fork of Darknet/YOLO? I'm travelling right now and don't have access to my Windows environment to test, but I'd be curious to know if this is a problem with https://github.com/hank-ai/darknet?tab=readme-ov-file#table-of-contents

Statgator2 commented 5 months ago

I came back to see if anyone had commented as to what might be the cause of this. I'm on a custom fork of darknet. Sorry I can't help more. Hopefully someone will come by with some insight.

stephanecharette commented 5 months ago

I just tried it with the Hank.ai branch. Weights trained on my Ubuntu training rig. Copied the neural network to a Windows 11 computer. Everything works perfectly on both devices.

If you want to try, I was using these weights and video files to test: https://www.ccoderun.ca/programming/2024-05-01_LegoGears/

Again, I wrote this to you last month, but note that this repo is no longer maintained. You'll want to be using the more recent Hank.ai repo here: https://github.com/hank-ai/darknet?tab=readme-ov-file#table-of-contents

Statgator2 commented 5 months ago

The issue I'm experiencing is training the exact same data with the exact same bounding boxes on two different OS's (Windows Vs Linux). Like above in linux I'm getting expected performance. In Windows I'm getting illogical detections that don't match the training.

Thanks for the link to Hank.ai. I will migrate over.