agentmorris / MegaDetector

MegaDetector is an AI model that helps conservation folks spend less time doing boring things with camera trap images.
MIT License
116 stars 24 forks source link

utils.general.scale_coords changed to scale_boxes in recent versions of YOLOv5 #94

Closed agentmorris closed 1 year ago

agentmorris commented 1 year ago

utils.general.scale_coords has changed to scale_boxes in recent versions is YOLOv5. This patch could help future proof PTDetector especially for those trying to use M1s.

This patch also includes the torch.load() changes discussed in issue #312 which are needed for the M1 folks. Additionally, I did a quick check on a GTX1080Ti and if device is GPU

checkpoint = torch.load(model_pt_path, map_location=device)
model = checkpoint['model'].float().fuse().eval()

appears to use almost twice a much GPU RAM as

checkpoint = torch.load(model_pt_path)
model = checkpoint['model'].float().fuse().eval().to(device)

Issue cloned from Microsoft/CameraTraps, original issue posted by persts on Dec 02, 2022.

agentmorris commented 1 year ago

I'm testing this out now... I don't know anything about the difference between using .to(device) and (...map_location=device), so if they produce the same results and your tests say the latter uses less memory, I'm all for it.

But... in some quick testing, they appear to produce almost exactly the same results, but not exactly. Small differences, but not way out in the long tail of floating-point rounding errors: confidence values and coordinates appear to differ as early as two or three decimal places in some cases. I am wary of making changes that yield different results depending on when someone ran the inference scripts, though I would definitely consider this if there were a significant performance benefit. So, a couple questions:

  1. Do you know whether we expect these approaches to produce exactly the same results?
  2. Memory aside, have you seen a significant difference in inference speed?

Thanks.

-Dan


(Comment originally posted by agentmorris)

agentmorris commented 1 year ago

The expectation is that the results should be the same between the two approaches. Inference speed on a single image is not expected to change but the difference in memory usage would allow for larger batches if implemented.

I just did a test on Ubuntu with GTX1080, PyTorch 1.12.1+cu113, and a fresh pull of YOLOv5 and I don't see any difference in box coordinates between the approaches on the GPU or CPU, but note I am using a newer model not the official MDv5 because of the recompute_scale_factor issue.

Given the M1 TypError issue, if I had to dig deep into the speculation pocket and not just find lint, maybe there are some type casting bugs with older versions of PyTorch + YOLOv5?


(Comment originally posted by persts)

agentmorris commented 1 year ago

Hmmm. Well, I don't love that this changes things ever-so-slightly, but it's definitely getting to the point where I can no longer pretend M1 Macs don't exist, and I also just learned that M1 cloud VMs exist now, which will make testing easier (since I don't have a physical M1 to test on). So in a few weeks I'll take a closer look at this PR at the same time as I look at issue 64 (M1 support, which I just re-opened) and issue 72 (your predecessor to this PR).

I think my dream of having a single set of bits that runs in all environments and produces absolutely identical results to MDv5a/b is not realistic, so the most likely path forward is to use your approach to fine-tuning with almost no change to weights to produce a new MDv5.01a/MDv5.01b, and update all dependencies simultaneously, to a newer PT version and a newer YOLO version (which may or may not also require, e.g., updates to recommended CUDA versions). I am optimistic that with some testing, we will have a new model and associated setup process that produces identical results on all platforms.

Thanks again for leading the way on this; we will get this cleaned up pretty soon, and we would literally never have figured all this out without your contributions (had you not done all this work, the answer would just have permanently been "sorry, M1s are not supported").


(Comment originally posted by agentmorris)

agentmorris commented 1 year ago

I just set up an Ubuntu (GTX1080) environment to use the official md_v5a.0.0.pt based on the current install instructions and older dependencies. One additional variable to consider is that I am not using Anaconda, just python 3.8.10 and pip to manually install dependencies.

I don't see any differences in the box coordinates between the torch.load() approaches with the older dependencies.

Additionally, if I add the recompute_scale_factor patch mentioned in #312

checkpoint = torch.load('/home/pete/devel/model_building/saved/YOLO/md_v5a.0.0.pt', map_location='cuda:0')
for m in checkpoint['model'].modules():
    if type(m) is torch.nn.Upsample:
        m.recompute_scale_factor = None
model = checkpoint['model'].float().fuse().eval()

I don't see any difference in coordinates between the venv with the older dependencies and the venv with the newest versions mentioned earlier when using md_v5a.0.0.pt.

It would be good to find another person to test. I am happy to send you my test code as well.


(Comment originally posted by persts)

agentmorris commented 1 year ago

Change details are here.

agentmorris commented 1 year ago

Current versions of YOLOv5 work fine with MDv5.1 now, closing this issue.