Closed agentmorris closed 1 year ago
Hmmm... that's not what we want. I don't have an M1 Mac to test on, but this thread looks promising:
It seems like maybe running:
conda install nomkl
...prior to any other package installations (but from within the target conda environment) will fix this. Assuming you're installing from from environment-detector-mac.yml, can you try adding "nomkl" somewhere near the top of that file?
If that works, that's a good solution and we'll update the instructions.
If that doesn't work, that doesn't necessarily mean that "nomkl" isn't a good approach; I don't think in-order installation is guaranteed from a conda environment file, so we may need to do some more experimenting. But that's a good first debugging step.
Mind giving that a try?
Thanks!
(Comment originally posted by agentmorris)
Progressed past the initial mentioned issue
and
CONDA_SUBDIR=osx-arm64 conda env create --file environment-detector-mac.yml
conda activate cameratraps-detector-m1
conda env config vars set CONDA_SUBDIR=osx-arm64
conda activate
conda activate cameratraps-detector-m1
pip3 install -U --pre torch torchvision --extra-index-url https://download.pytorch.org/whl/nightly/cpu
export PYTHONPATH="$PYTHONPATH:$HOME/Documents/GitHub/cameratraps:$HOME/Documents/GitHub/ai4eutils:$HOME/Documents/GitHub/yolov5"
Output now looks like this:
Running detector on 1 images...
PyTorch reports 0 available CUDA devices
GPU available: False
Using PyTorch version 1.13.0.dev20220704
Fusing layers...
Model summary: 574 layers, 139990096 parameters, 0 gradients
Loaded model in 3.42 seconds
Fusing layers...
Model summary: 574 layers, 139990096 parameters, 0 gradients
Loaded model in 1.35 seconds
0%| | 0/1 [00:00<?, ?it/s]
PTDetector: image test_images/test_images/caltech_camera_traps_5a0e37cc-23d2-11e8-a6a3-ec086b02610b.jpg failed during inference: 'Upsample' object has no attribute 'recompute_scale_factor'
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.40s/it]
On average, for each image,
- loading took 0.07 seconds, std dev is not available
- inference took 1.31 seconds, std dev is not available
This error is mentioned here https://github.com/ultralytics/yolov5/issues/6948 within yolov5. @agentmorris is this why a specific commit of yolov5 is mentioned?
I have tried each permutation of the following with the same result:
Any ideas or something i am overlooking?
(Comment originally posted by sim-kelly)
This issue is why we install a specific version of PyTorch, actually. At the time we released MDv5, even the newest commits to YOLOv5 had this issue when running against the newest version of PyTorch. Our use of a specific YOLOv5 commit is just future-proofing. I don't know that anything would stop working if you used the most recent YOLOv5 commit, but just to minimize the number of variables, I recommend sticking with the recommended commit.
So I think what's happening here is that when you pip installed PyTorch (after installing nomkl), you installed the latest version. Can you install pytorch 1.10.1 and torchvision 0.11.2 and see what happens?
(Comment originally posted by agentmorris)
Closing due to inactivity, but let us know if you're able to make nomkl work by checking out the recommended PyTorch version. Thanks!
(Comment originally posted by agentmorris)
For Apple M1 support you will need the following:
Creating an updated version of MDv5 is pretty easy and will make the model available to newer versions of PyTorch on all platforms by removing the upsample problem.
Set up a new virtual environment with the latest versions of PyTorch and YOLOv5
Follow the YOLOv5 instructions for organizing directories Training Custom Data but only include one image and one label. Then "fine-tune" the current mdv5a.0.0.pt model with all layers frozen:
My dataset.yaml :
train: /home/pete/Desktop/MD/images/train
val: /home/pete/Desktop/MD/images/train
nc: 3
names: ['animal', 'person', 'vehicle']
python train.py --weights md_v5a.0.0.pt --data dataset.yaml --freeze 33 --epochs 1 --batch 1 --img 1280
The resulting model will be usable on all platforms with the latest versions of PyTorch. *NOTE: training can not be done on the M1
(Comment originally posted by persts)
We are frustratingly close to not having to deal with several of these issues: the Upsample issue appears to have been resolved when using the latest YOLOv5 and the latest stable build of PyTorch (1.12); i.e. you can use "vanilla MD" with the latest PyTorch version. We're not updating our recommended environment yet, mostly because we don't want to rock the boat and what we have is working, but for those who have a specific reason to use the latest stable PyTorch build or the latest YOLOv5, I've confirmed that this works.
However, the latest nightly build of PyTorch (1.13) still has the Upsample issue, and 1.13 is what's required for M1 support. Grrr. So to use M1 support, you'll still need a custom MegaDetector and the nightly PT build.
But I just merged Peter's PR to add M1 inference support, as well as a new environment-detector-m1.yml file that is identical to environment-detector-mac.yml, except that "nomkl" has been added. For now, this isn't "officially" supported, but this issue will serve as documentation for adventurous folks who want to try this.
These instructions assume you have a recent version of our repo... i.e., go into your CameraTraps folder (c:\git\CameraTraps if you copied and pasted our "standard" instructions), and run:
git fetch && git pull
We can't say that "latest" will always be correct, but for now (September 2022), "latest" works here, but the YOLOv5 commit (c23a441c9df7ca9b1f275e8c8719c949269160d1 ) that we recommend for "standard" MD use does not work. So, if you've already checked out the old commit, head into your YOLOv5 folder and run:
git checkout master
If and when M1 inference becomes "officially supported", we'll pin a new commit that we've tested thoroughly. For now, just go with latest.
Peter's instructions were excellent, I trained one epoch on one image with all layers frozen. My dataset.yml looked like this:
train: /home/user/train
val: /home/user/train
nc: 3
names: ['animal', 'person', 'vehicle']
And the only files in /home/user/train were one image from Snapshot Serengeti and the corresponding .txt file. It looks like you cannot train on only one negative image, there must be at least one bounding box. FYI the image I used is here and the bbox file is here.
I ran:
python train.py --weights ~/train/[md_v5a.0.0.pt](http://md_v5a.0.0.pt/) --data ~/train/dataset.yml --freeze 33 --epochs 1 --batch 1 --img 1280 --name md_v5a
...and ditto for MDv5b.
I compared the output to "stock" MDv5a, and I really want them to be exactly the same. Boxes appear to be the same to around two decimal places in both location and confidence, which is good, but I would feel better if they were exactly the same. I also tried opening up data/hyp/*.yml, where YOLOv5 stores its hyperparameters, and setting all learning rates to zero. This resulted in... slightly different numbers, but off by about the same (very small) amount. This is the main reason this will remain unofficial for now, but for folks who are dying to get around this Upsample issue, here are my re-built MDv5a and MDv5b files. YMMV.
Only very slight variations to Sam's instructions earlier on this issue:
# Will not install PyTorch, includes "nomkl" package
CONDA_SUBDIR=osx-arm64 conda env create --file environment-detector-m1.yml
conda activate cameratraps-detector
# Needs PyTorch version >= 1.13, this will get you 1.13.0 as of 2022.09.07
pip3 install -U --pre torch torchvision --extra-index-url https://download.pytorch.org/whl/nightly/cpu
# Full disclosure: I did this, but didn't actually test whether this is necessary
conda env config vars set CONDA_SUBDIR=osx-arm64
(Comment originally posted by agentmorris)
@agentmorris, I tried using your suggested solution for an Apple M1:
python detection/run_detector.py "/Users/g/Downloads/md_v5a.0.0_rebuild_pt-1.12_zerolr.pt" --image_file "/Users/g/Documents/Images_for_models/Classified_Yamal_2022/Good_Animal_NoBait->>Good_Animal_Bait/cam7_2022-03-25_03-35-00.JPG" --threshold 0.1
and get this error:
Running detector on 1 images...
PyTorch reports 0 available CUDA devices
PyTorch reports Metal Performance Shaders are available
GPU available: True
Using PyTorch version 1.13.0.dev20220920
Traceback (most recent call last):
File "detection/run_detector.py", line 529, in <module>
main()
File "detection/run_detector.py", line 518, in main
load_and_run_detector(model_file=args.detector_file,
File "detection/run_detector.py", line 286, in load_and_run_detector
detector = load_detector(model_file)
File "detection/run_detector.py", line 263, in load_detector
detector = PTDetector(model_file, force_cpu)
File "/Users/gerardocelis/git/cameratraps/detection/pytorch_detector.py", line 41, in __init__
self.model = PTDetector._load_model(model_path, self.device)
File "/Users/gerardocelis/git/cameratraps/detection/pytorch_detector.py", line 48, in _load_model
checkpoint = torch.load(model_pt_path, map_location=device)
File "/Users/gerardocelis/opt/miniconda3/envs/cameratraps-detector/lib/python3.8/site-packages/torch/serialization.py", line 763, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "/Users/gerardocelis/opt/miniconda3/envs/cameratraps-detector/lib/python3.8/site-packages/torch/serialization.py", line 1100, in _load
result = unpickler.load()
File "/Users/gerardocelis/opt/miniconda3/envs/cameratraps-detector/lib/python3.8/site-packages/torch/serialization.py", line 1093, in find_class
return super().find_class(mod_name, name)
AttributeError: Can't get attribute 'DetectionModel' on <module 'models.yolo' from '/Users/gerardocelis/git/yolov5/models/yolo.py'>
Any ideas how to proceed?
(Comment originally posted by gerlis22)
Are you using the specific YOLOv5 commit (c23a441c9df7ca9b1f275e8c8719c949269160d1) that we recommend for a "standard" MegaDetector setup? If so, I'm about 65% sure this is the issue: for the accelerated M1 setup, you'll need a newer version of YOLOv5. I.e., in your YOLOv5 repo folder, run:
git checkout master
My bad, I should have clarified this above as well.
If this works, can you confirm here, and I'll update the instructions?
(Comment originally posted by agentmorris)
Yes, indeed was using YOLOv5 commit (c23a441c9df7ca9b1f275e8c8719c949269160d1). I updated to commit(489920ab30b217fed14d3ddd31c23e9afc5be238) and works now. Thanks!
(Comment originally posted by gerlis22)
Excellent, I added a section to my post earlier on this issue.
(Comment originally posted by agentmorris)
@agentmorris I ran run_detector.py
and run_detector_batch.py
on the same image and get very different confidence values. For example, run_detector.py
shows an animal with a confidence of 0.95, whereas the run_detector_batch.py
is 0.2. Could this be an issue with run_detector_batch.py
?
(Comment originally posted by gerlis22)
I connected with @gerliss22 offline; it turns out that the discrepancy was between MDv5a and MDv5b (which is fine), not between run_detector.py and run_detector_batch.py (which would be a catastrophe). So, no cause for alarm here, but thanks to @gerlis22 for checking on this, always better to ask!
(Comment originally posted by agentmorris)
Thanks for the great work.
Just in case anyone else gets stuck where I was for the past hours.
Since torch 0.13 is released now, I changed the environment-detector-m1.yml
file to install them:
diff --git a/environment-detector-m1.yml b/environment-detector-m1.yml
index 13979af..ad7050e 100644
--- a/environment-detector-m1.yml
+++ b/environment-detector-m1.yml
@@ -27,8 +27,8 @@ dependencies:
- pandas
- seaborn>=0.11.0
- PyYAML>=5.3.1
- # - pytorch::pytorch=1.10.1
- # - pytorch::torchvision=0.11.2
+ - pytorch::pytorch=1.13.1
+ - pytorch::torchvision=0.14.1
# - conda-forge::cudatoolkit=11.3
# - conda-forge::cudnn=8.1
Then to install:
CONDA_SUBDIR=osx-arm64 conda env create --file environment-detector-m1.yml
conda activate cameratraps-detector
python detection/run_detector.py ~/Downloads/md_v5a.0.0.pt --image_file ~/Desktop/video/withbird.png --threshold 0.1
And it results in:
Running detector on 1 images...
PyTorch reports 0 available CUDA devices
PyTorch reports Metal Performance Shaders are available
GPU available: True
Using PyTorch version 1.13.1
Traceback (most recent call last):
File "detection/run_detector.py", line 529, in <module>
main()
File "detection/run_detector.py", line 518, in main
load_and_run_detector(model_file=args.detector_file,
File "detection/run_detector.py", line 286, in load_and_run_detector
detector = load_detector(model_file)
File "detection/run_detector.py", line 263, in load_detector
detector = PTDetector(model_file, force_cpu)
File "/Volumes/Work/megadetector/cameratraps/detection/pytorch_detector.py", line 48, in __init__
self.model = PTDetector._load_model(model_path, self.device)
File "/Volumes/Work/megadetector/cameratraps/detection/pytorch_detector.py", line 57, in _load_model
checkpoint = torch.load(model_pt_path, map_location=device)
File "/opt/homebrew/Caskroom/mambaforge/base/envs/cameratraps-detector/lib/python3.8/site-packages/torch/serialization.py", line 789, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "/opt/homebrew/Caskroom/mambaforge/base/envs/cameratraps-detector/lib/python3.8/site-packages/torch/serialization.py", line 1131, in _load
result = unpickler.load()
File "/opt/homebrew/Caskroom/mambaforge/base/envs/cameratraps-detector/lib/python3.8/site-packages/torch/_utils.py", line 153, in _rebuild_tensor_v2
tensor = _rebuild_tensor(storage, storage_offset, size, stride)
File "/opt/homebrew/Caskroom/mambaforge/base/envs/cameratraps-detector/lib/python3.8/site-packages/torch/_utils.py", line 146, in _rebuild_tensor
t = torch.tensor([], dtype=storage.dtype, device=storage.untyped().device)
TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
In this case, the fix is (to RTFM....) download @agentmorris's patched models above. I got confused because it gave me this error rather than the "advertised" Upsampling error.
python detection/run_detector.py ~/Downloads/md_v5a.0.0_rebuild_pt-1.12_zerolr.pt --image_file ~/Desktop/video/withbird.png --threshold 0.1
(Comment originally posted by reinhrst)
See issue 72 where Peter suggests this change:
index 7778d1b..7774ac0 100644
--- a/detection/pytorch_detector.py
+++ b/detection/pytorch_detector.py
@@ -45,8 +45,8 @@ class PTDetector:
@staticmethod
def _load_model(model_pt_path, device):
- checkpoint = torch.load(model_pt_path, map_location=device)
- model = checkpoint['model'].float().fuse().eval() # FP32 model
+ checkpoint = torch.load(model_pt_path)
+ model = checkpoint['model'].float().fuse().eval().to(device) # FP32 model
return model
Finally closing this issue and adopting Peter's recommended change!
Hi All - very excited about this release!
I know this is probably deep down in the dependencies but wanted to raise given that M1 chips are becoming more common and that Mac instructions are given in the README.
The following error occurred running run_detector.py
Issue cloned from Microsoft/CameraTraps, original issue posted by sim-kelly on Jun 23, 2022.