CamTrap detector gives different detections to EcoAssist or Megadetector Batch API

SimonKravis commented 1 year ago

I have written code to analyse JSON files produced by CamTrap Detector ( Windows 64 bit, CPU only, v 0.4.0) , EcoAssist and the Megadetector API at https://github.com/agentmorris/MegaDetector/tree/main/api/synchronous. I noticed that CamTrap Detector is much faster in operation than the other two but the histogram of detection confidences is vastly different from either when processing the same set of 1581 images, all 2048 x 1440 pixels. Is there an explanation for this? App Confidence Histograms

agentmorris commented 1 year ago

CamTrap Detector is using an ONNX-exported version of MD, so there's not an expectation that the results would be identical. The fact that they're so far off is surprising, but I'll let Ben debug that, that's not why I'm replying here. :)

I'm more interested in the very small differences between EcoAssist and MD... when you say "MD API", I think you mean "your self-hosted instance of the MD API", so it's up to you which model file you choose. Can you confirm that (a) you are using MDv5a (as opposed to MDv5b) in both your API instance and EcoAssist, and (b) you are running EcoAssist on Intel silicon (as opposed to a Mac M1/M2)? There is almost certainly a difference in the confidence threshold at the lower end, which doesn't bother me. But the very small difference in the number of 0.9-->1.0 results shouldn't happen if you are running the same model weights.

Also confirm that prior to uploading to the API, you don't do any resizing? I.e., the 2048x1440 image makes it all the way to the API?

SimonKravis commented 1 year ago

Hi Dan

EcoAssist is running on a laptop using an AMD Ryzen rather than Intel processor (as below)

and the web service uses the MDv5a model as you suggest. The small high-confidence differences between EcoAssist and your batch API may be because the EcoAssist JSON has a max_confidence value which does not discriminate between categories and I use this value, whereas from your batch API I only take the max confidence from animal category. There are some humans who appear at the beginning and end of the data set I’m using, so that may explain the high-confidence differences.

I hope CamTrap Detector can be fixed up as it’s much faster (and easier) than my web service. The program doesn’t have a lot to go wrong with it (unlike CP, which I keep finding problems with).

Regards

Simon Kravis

From: Dan Morris @.> Sent: Thursday, 15 June 2023 12:13 PM To: bencevans/camtrap-detector @.> Cc: SimonKravis @.>; Author @.> Subject: Re: [bencevans/camtrap-detector] CamTrap detector gives different detections to EcoAssist or Megadetector Batch API (Issue #174)

CamTrap Detector is using an ONNX-exported version of MD, so there's not an expectation that the results would be identical. The fact that they're so far off is surprising, but I'll let Ben debug that, that's not why I'm replying here. :)

I'm more interested in the very small differences between EcoAssist and MD... when you say "MD API", I think you mean "your self-hosted instance of the MD API", so it's up to you which model file you choose. Can you confirm that (a) you are using MDv5a (as opposed to MDv5b) in both your API instance and EcoAssist, and (b) you are running EcoAssist on Intel silicon (as opposed to a Mac M1/M2)? There is almost certainly a difference in the confidence threshold at the lower end, which doesn't bother me. But the very small difference in the number of 0.9-->1.0 results shouldn't happen if you are running the same model weights.

— Reply to this email directly, view it on GitHub https://github.com/bencevans/camtrap-detector/issues/174#issuecomment-1592236045 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCCY6CB3N4YTMVG2VYMQB3XLJVSVANCNFSM6AAAAAAZHEV2T4 . You are receiving this because you authored the thread. https://github.com/notifications/beacon/ACCCY6H6J3ZW73BUGART33DXLJVSVA5CNFSM6AAAAAAZHEV2T6WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTS646MA2.gif Message ID: @. @.> >

agentmorris commented 1 year ago

Ah, all clear. If one of those histograms includes people, and the other doesn't, that's an adequate explanation for me, I don't need to debug EcoAssist/API differences any further.

Interested in what Ben thinks about the substantial shift in confidence ranges. It's not "bad" per se, as long as it's mostly a linear shift and one can still draw a (smaller) confidence threshold somewhere for CamTrap Detector output. I.e., all the matters is that you can get almost exactly the same precision for a given recall, even if it means using a different confidence threshold. But the difference in the histograms is surprising, and in a perfect world, typical MegaDetector confidence thresholds would be applicable to CamTrap Detector output as well.

SimonKravis commented 1 year ago

The difference isn't a linear shift - hope it can be easily fixed

bencevans commented 1 year ago

Thanks for letting me know @SimonKravis! I've replicated the issue, too and have started debugging. I was hoping it's what @agentmorris has previously suggested with the image input size. We feed images in resized to 640 x 640px and have experimented with increasing that.

I'm wondering now if the input and output tensors are read/written correctly. @agentmorris I don't suppose you know if there's any normalisation done on the input, such as mean and std_dev? Are the inputs 0.0-1.0 or 0.0 - 255.0?

ONNX MDv5a 640x640

640

ONNX MDv5a 1280x1280

1280

PyTorch MDv5a

mdv5a

PyTorch MDv5a (640x640)

mdv5a-640

agentmorris commented 1 year ago

The relevant lines of code are here:

https://github.com/agentmorris/MegaDetector/blob/main/detection/pytorch_detector.py#L118

We follow exactly what YOLOv5 recommends, but it's hard for me to say exactly how much of this applies to the ONNX version, or what does/doesn't happen inside the PyTorch model execution. But given an image loaded with PIL (in RGB/HWC format, as 8-bit ints) prior to running the PyTorch model, we:

Convert from HWC to CHW
Use the YOLOv5 letterbox() function to scale/pad to the correct size
Convert to float
Scale into the range (0,1) (i.e., divide by 255)

The latter is not normalization in the sense of scaling based on image pixels, it's just scaling into the range that the .pt model expects.

My recommendation here is to run an image through CamTrap Detector, then run the same onnx weights through YOLOv5's detect.py function, specifying the same inference size. You should get exactly the same output (i.e., confidence values and box locations should agree to at least six decimal places) (I've verified this for the .pt models). We can assume that YOLOv5's detect.py does everything right, so any difference here is worth debugging ASAP.

It could be a normalization issue, it could be that color channels aren't ordered correctly, or it could be noting at all: maybe YOLOv5's ONNX inference pipeline will also produce confidence values in the same range you're seeing, in which case you're doing everything right, and we should evaluate the impact on precision and recall, maybe it's all just fine.

But validating against detect.py would be IMO the most important debugging step. Happy to help with that if you want me to tinker with it.

agentmorris commented 1 year ago

Oh, and sorry, all of those steps were things that happen to the pixels before running the model. I'm guessing that if something is "wrong", it's on that side, but you actually asked about normalizing/processing the output tensors. My guess is that if there was a bug there, it would be a total catastrophe and you would basically get random numbers, but, who knows. FWIW, we also follow YOLOv5's recommendations here, the relevant lines of code are here:

https://github.com/agentmorris/MegaDetector/blob/main/detection/pytorch_detector.py#L147

It's basically just orienting the coordinates correctly, there is no manipulation of confidence values. And if you weren't normalizing coordinates correctly, you would know right away, boxes would be in random places. So, still going with my hypothesis that if there is an issue, it's on the input side. But either way, IMO, the next step is the same: debug against the gold standard, which is YOLOv5's detect.py.

bencevans commented 1 year ago

Cheers @agentmorris! Use the YOLOv5 letterbox() function to scale/pad to the correct size. I think you're onto a winner there!...

bencevans commented 1 year ago

Finally, I've found it! It's my implementation of Non-Max Suppression. So the actual results are about right regarding whether an image is empty or contains animals and the location within the image... because the YOLO model produces many detections, some of which usually overlap, we reduce detections over an Intersection threshold to just one. Instead of reporting the confidence of the highest detection, it's returning the confidence of the least value 🤦

I'll try and get the fix put in properly and builds done tomorrow 😄

P.S. I also had the Red and Blue channels the wrong way. I'm quite impressed and concerned this hasn't caused too much of an issue in past testing O.o

bencevans commented 1 year ago

ONNX 640px with fixes applied:

with changes 640

agentmorris commented 1 year ago

Nice work! That looks a lot like the PyTorch 640x640 plot... now you've got us on the edge of our seats, though; is it easy to generate the ONNX 1280x1280 plot with the fixes applied?

bencevans commented 1 year ago

ONNX 1280px with fixes applied 🪄

1080withfixes

agentmorris commented 1 year ago

It's definitely a different distribution than the PyTorch distribution, but seems within the bounds of reasonable differences now. I would still consider running against detect.py with NMS turned off in both pipelines (because you know NMS will be slightly different) to make sure there's nothing else going on; any difference there would be a Bad Thing. But this is definitely getting closer; good catch @SimonKravis .

SimonKravis commented 1 year ago

My observation from looking at a range of camera trap data sets is that some have thousands of images very few of which contain animals and seldom with more than one in the frame, whilst others mostly contain animals, often with more than one in the frame. If testing of CamTrap was done with the former, the problem I found wouldn't have shown up. Fortunately I have access to both sorts.

SimonKravis commented 1 year ago

Can you let me know when a new Windows x64 installer is available?

bencevans commented 1 year ago

I've got some builds done! It took longer than I had hoped; issues with dependencies building statically (so no external dependencies are needed). Treat these builds as alpha, and I'll remove them from the server once I'm comfortable it's performing correctly, signed the macOS build and produced an ARM (M1/M2) build, but I think you're both Windows users, so the following should be okay to play with...

@SimonKravis, there's a .exe as well as a .msi. It would be great to hear if it resolves the warnings you've mentioned.

The inference speed is expected to slow compared to prior versions as it's now processing images with a 1280x1280 resolution rather than the prior 640x640 but possibly more likely to pick up animals further away. I'd be interested as to what speed you get?

Download	MD5
CamTrap.Detector_1.0.0_x64.AppImage	6a326f2b9b7d19629675f9ed8a68fe4b
CamTrap.Detector_1.0.0_x64.dmg	692a8f7847219da67419551220c25ec9
CamTrap.Detector_1.0.0_x64_en-US.msi	7ef6ff5755717f383f1388c907a36aaf
CamTrap.Detector_1.0.0_x64-setup.exe	cdf527f946c9b19295af935eb74c889b

SimonKravis commented 1 year ago

Hi Ben

The .exe installer worked OK but still warns “Windows Protected your PC”– you probably have to buy a code-signing license to prevent this – and the confidence levels look more like those obtained from the Megadetector batch app which I run as a web service. Some comparisons below:

CamTrap 0.4.0

"file": "RCNX0118.JPG",

  "image_width": 2048,

  "image_height": 1440,

  "detections": [

    {

      "x": 0.0,

      "y": 0.044014502,

      "width": 0.14228074,

      "height": 0.6912068,

      "category": 1,

      "confidence": 0.103076085

    }

  ]

CamTrap 1.0.0

  "file": "RCNX0118.JPG",

  "image_width": 2048,

  "image_height": 1440,

  "detections": [

    {

      "x": 0.0,

      "y": 0.11687617,

      "width": 0.13930157,

      "height": 0.62046707,

      "category": 1,

      "confidence": 0.9238495

    }

  ]

EcoAssist

"file": "RCNX0118.JPG",

"max_detection_conf": 0.926,

"detections": [

{

 "category": "1",

 "conf": 0.00583,

 "bbox": [

  0.3974,

  0.1777,

  0.06542,

  0.1909

 ]

},

{

 "category": "1",

 "conf": 0.00979,

 "bbox": [

  0,

  0.02013,

  0.5302,

  0.6527

 ]

},

{

 "category": "1",

 "conf": 0.0106,

 "bbox": [

  0.3969,

  0.025,

  0.1328,

  0.6138

 ]

},

{

 "category": "1",

 "conf": 0.926,

 "bbox": [

  0,

  0.134,

  0.1381,

  0.6034

 ]

}

]

}

Megadetector web service – format is [y1,x1,y2,x2, confidence, category]

,{"RCNX0118.JPG": [[0.025, 0.3969, 0.6388, 0.5297, 0.0106, 1], [0.134, 0, 0.7374, 0.1381, 0.926, 1]]}

Speed is now about 5 sec on a 2048 x 1440 pixel image – still a bit better than Web service running the Megadetector batch option on a general purpose Linux virtual server (which is about 7 sec), but lower resolution and faster operation would be desirable for me. Could it be an option?

I’ll try including your .exe and the .onnx file in my Windows install, and allow users to run the .exe from CP, which will save them from dealing with warning. Despite the slowdown, having a local Windows app is far simpler than using a rented server and dealing with all the ways in which a web service can fail and paying the rental. Multiple concurrent users mean response time is multiplied by the number of concurrent users as each image is a post operation.

Regards

Simon Kravis

From: Ben Evans @.> Sent: Wednesday, 21 June 2023 2:17 AM To: bencevans/camtrap-detector @.> Cc: SimonKravis @.>; Mention @.> Subject: Re: [bencevans/camtrap-detector] CamTrap detector gives different detections to EcoAssist or Megadetector Batch API (Issue #174)

I've got some builds done! It took longer than I had hoped; issues with dependencies building statically (so no external dependencies are needed). Treat these builds as alpha, and I'll remove them from the server once I'm comfortable it's performing correctly, signed the macOS build and produced an ARM (M1/M2) build, but I think you're both Windows users, so the following should be okay to play with...

@SimonKravis https://github.com/SimonKravis , there's a .exe as well as a .msi. It would be great to hear if it resolves the warnings you've mentioned.

The inference speed is expected to slow compared to prior versions as it's now processing images with a 1280x1280 resolution rather than the prior 640x640 but possibly more likely to pick up animals further away. I'd be interested as to what speed you get?

Download

MD5

CamTrap.Detector_1.0.0_x64.AppImage https://static.bencevans.io/tmp/ct1-alpha/CamTrap.Detector_1.0.0_x64.AppImage

6a326f2b9b7d19629675f9ed8a68fe4b

CamTrap.Detector_1.0.0_x64.dmg https://static.bencevans.io/tmp/ct1-alpha/CamTrap.Detector_1.0.0_x64.dmg

692a8f7847219da67419551220c25ec9

CamTrap.Detector_1.0.0_x64_en-US.msi https://static.bencevans.io/tmp/ct1-alpha/CamTrap.Detector_1.0.0_x64_en-US.msi

7ef6ff5755717f383f1388c907a36aaf

CamTrap.Detector_1.0.0_x64-setup.exe https://static.bencevans.io/tmp/ct1-alpha/CamTrap.Detector_1.0.0_x64-setup.exe

cdf527f946c9b19295af935eb74c889b

— Reply to this email directly, view it on GitHub https://github.com/bencevans/camtrap-detector/issues/174#issuecomment-1599107932 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCCY6EZTJFJYIZUBUVSSPLXMHEIZANCNFSM6AAAAAAZHEV2T4 . You are receiving this because you were mentioned. https://github.com/notifications/beacon/ACCCY6FXEL7QSDV7VV6L2H3XMHEIZA5CNFSM6AAAAAAZHEV2T6WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTS7KBZVY.gif Message ID: @. @.> >

bencevans commented 1 year ago

Closing this issue as the code fixing is now merged but will be a little while before I can work on the releases again. Exp end of October once finished writing thesis ✍️

bencevans / camtrap-detector