Closed IsmailAM1999 closed 1 year ago
Hi, One possible suggestion is to avoid calling save on the entire model as I see in your Traceback, but only on the state dict:
torch.save(model.state_dict(), wdir / 'init.pt')
at line 306 of /users/local/ismail_env/Codes/YOLOv7-Event/train.py
.
We are aware that QuantModules might be incompatible with pickling, however the cases where you need to save your model like that are rare, while it's more common to save only the state dict of a torch model. There should not be any issue in saving the state dict of a model quantized with Brevitas.
Please let me know if that fixes your issue.
Giuseppe
Hi,
Sorry for being to late to respond to this issue, it works fine while saving and loading state_dict, I currently tried it on Yolov5 and it is perfect.
Thank you very much for your help
Hi,
Sorry for being to late to respond to this issue, it works fine while saving and loading state_dict, I currently tried it on Yolov5 and it is perfect.
Thank you very much for your help
Sorry for the bothering. @IsmailAM1999
I'm working with the YOLOv5 quantization as well and got the same problem as you metioned in #627 originally. So I followed @Giuseppe5 possible suggestion to avoid saving the entire model, but only on the state dict:torch.save(model.state_dict(), wdir / 'init.pt')
. But I encountered the additional problem by this solution.
When the training phase have done, the error involves the attempt_load(f, device).half()
functionality. Because of only store the weight instead of the whole ckpt, I got the lack of the informations, as ema, model, etc. So the error occured in yolov5/models/experimental.py
's attempt_load()
function at the ckpt = (ckpt.get('ema') or ckpt['model']).to(device).float()
. The error message as below:
Starting training for 1 epochs...
# the output of the epoch message in training phase
1 epochs completed in 0.000 hours.
Validating runs/train/exp12/weights/best.pt...
Traceback (most recent call last):
File "train.py", line 662, in <module>
main(opt)
File "train.py", line 551, in main
train(opt.hyp, opt, device, callbacks)
File "train.py", line 436, in train
model=attempt_load(f, device).half(),
File "/**/yolov5/models/experimental.py", line 96, in attempt_load
ckpt = (ckpt.get('ema') or ckpt['model']).to(device).float() # FP32 model
KeyError: 'model'
Have you encounter this problem? How do you solve it?
If not, maybe can give me some suggestions to revise the problem to make YOLOv5 train successfully. It can be the alternative way to save the ckpt whole informations not only the weight (w/o pickle method) or code replacement at the attempt_load()
function.
Best regards, thanks for your time and your interest in my request.
Hi @TCGoingW ,
Exactly I got the same error at this point, I do not have the code right now but I can tell you the steps :
You need to change few things on the function , instead of load an Ensemble, load directly the Model first, then load your weights with load_state_dict function. then extract the keys you need (ema,model...).
I also advise you to clean every weights transformation(.half( ) ) it can bug your code.
I hope this can be useful and keep me in touch if it does work or not.
Good luck !
Hi! @IsmailAM1999 Thanks for the reply!
Can you please be more specific about the parts that removing the Ensemble, loading weights with load_state_dict function and extracting the keys I need? I'm a little clueless :( If you can provide code to explain, I would be very grateful!
Sincerely, TCGoingW
Hi @TCGoingW ,
Exactly I got the same error at this point, I do not have the code right now but I can tell you the steps :
You need to change few things on the function , instead of load an Ensemble, load directly the Model first, then load your weights with load_state_dict function. then extract the keys you need (ema,model...).
I also advise you to clean every weights transformation(.half( ) ) it can bug your code.
I hope this can be useful and keep me in touch if it does work or not.
Good luck !
Sorry for bothering again. @IsmailAM1999
I'm try to follow your suggestions to modify attempt_load()
function, as load directly the Model first then load your weights with the load_state_dict function. The modification as below:
def attempt_load(weights, device=None, inplace=True, fuse=True):
# Loads an ensemble of models weights=[a,b,c] or a single model weights=[a] or weights=a
from models.yolo import Detect, Model
# ----
# model = Ensemble()
model = Model('./models/yolov5n-QuantBre.yaml', 3, 80)
for w in weights if isinstance(weights, list) else [weights]:
ckpt = torch.load(attempt_download(w), map_location='cpu') # load
# ckpt = (ckpt.get('ema') or ckpt['model']).to(device).float() # FP32 model
model.load_state_dict(ckpt['model'])
ckpt = model.to(device)
# Model compatibility updates
if not hasattr(ckpt, 'stride'):
ckpt.stride = torch.tensor([32.])
if hasattr(ckpt, 'names') and isinstance(ckpt.names, (list, tuple)):
ckpt.names = dict(enumerate(ckpt.names)) # convert to dict
model.append(ckpt.fuse().eval() if fuse and hasattr(ckpt, 'fuse') else ckpt.eval()) # model in eval mode
# ----
The model = Ensemble()
is annotation, then cause problem with the model.append
as below:
I think the model = Ensemble()
annotation need to replace by something but I have no idea.
Traceback (most recent call last):
File "train.py", line 675, in <module>
main(opt)
File "train.py", line 564, in main
train(opt.hyp, opt, device, callbacks)
File "train.py", line 448, in train
model=attempt_load(f, device),
File "/path/yolov5QuantBrevitas/yolov5/models/experimental.py", line 107, in attempt_load
model.append(ckpt.fuse().eval() if fuse and hasattr(ckpt, 'fuse') else ckpt.eval()) # model in eval mode
File "/path/yolov5QuantBrevitas/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1269, in __getattr__
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'DetectionModel' object has no attribute 'append'
I don't know the modify code is meet your suggestions or not. Maybe you can give me some modify advice. Sorry to bother you all the time.
Sincerely, TCGoingW
Hi @TCGoingW ,
Sorry for not being available these days, since you saved model state dict , you need to load the model first then load state dict, Here is my version of the function attempt_load, I hope it could help you to better understand :
`def attempt_load(weights, cfg, device=None, inplace=True, fuse=True): from models.yolo import Detect, Model # Import necessary classes
model = Ensemble()
for w in weights if isinstance(weights, list) else [weights]:
ckpt = torch.load(attempt_download(w), map_location=device) # load checkpoint
# Extract necessary components from your new checkpoint
epoch = ckpt["epoch"]
best_fitness = ckpt["best_fitness"]
model_state_dict = ckpt["model_state_dict"]
ema_state_dict = ckpt["ema_state_dict"]
updates = ckpt["updates"]
optimizer_state_dict = ckpt["optimizer_state_dict"]
opt = ckpt["opt"]
# Create your DetectionModel instance
detection_model = Model(
cfg=cfg, ch=3, nc=1, anchors=3
) # Replace with the actual path
# Load the state_dict into your DetectionModel instance
detection_model.load_state_dict(model_state_dict)
detection_model.to(
device
).eval() # Move the model to the appropriate device and set to eval mode
model.append(detection_model)
# Module updates
for m in model.modules():
t = type(m)
if t in (nn.Hardswish, nn.LeakyReLU, nn.ReLU, nn.ReLU6, nn.SiLU, Detect, Model):
m.inplace = inplace
if t is Detect and not isinstance(m.anchor_grid, list):
delattr(m, "anchor_grid")
setattr(m, "anchor_grid", [torch.zeros(1)] * m.nl)
elif t is nn.Upsample and not hasattr(m, "recompute_scale_factor"):
m.recompute_scale_factor = None # torch 1.11.0 compatibility
# Return model
if len(model) == 1:
return model[-1]
# Return detection ensemble
print(f"Ensemble created with {weights}\n")
for k in "names", "nc", "yaml":
setattr(model, k, getattr(model[0], k))
model.stride = model[
torch.argmax(torch.tensor([m.stride.max() for m in model])).int()
].stride # max stride
assert all(
model[0].nc == m.nc for m in model
), f"Models have different class counts: {[m.nc for m in model]}"
return model`
No bothers, Ask whenever you want. Good luck !
@IsmailAM1999 Your code helps me a lot!!!! Thank you very much. It is useful to my solution. I have another question need to ask! Quantization using Brevitas is doing well in the Backbone of Yolov5, but in the Head part of the Yolov5 go something wrong. The error message as below:
Traceback (most recent call last):
File "train.py", line 661, in <module>
main(opt)
File "train.py", line 550, in main
train(opt.hyp, opt, device, callbacks)
File "train.py", line 362, in train
results, maps, _ = validate.run(data_dict,
File "/home/user/yolov5QuantBrevitas/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/path/yolov5QuantBrevitas/yolov5/val.py", line 210, in run
preds, train_out = model(im) if compute_loss else (model(im, augment=augment), None)
File "/path/yolov5QuantBrevitas/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/path/yolov5QuantBrevitas/yolov5/models/yolo.py", line 268, in forward
return self._forward_once(x, profile, visualize) # single-scale inference, train
File "/path/yolov5QuantBrevitas/yolov5/models/yolo.py", line 180, in _forward_once
x = m(x) # run
File "/path/yolov5QuantBrevitas/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/path/yolov5QuantBrevitas/yolov5/models/common.py", line 137, in forward
out = self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1))
File "/path/yolov5QuantBrevitas/lib/python3.8/site-packages/brevitas/quant_tensor/__init__.py", line 93, in __torch_function__
return QUANT_TENSOR_FN_HANDLER[func](*args, **kwargs)
File "/path/yolov5QuantBrevitas/lib/python3.8/site-packages/brevitas/quant_tensor/torch_handler.py", line 50, in cat_handler
return QuantTensor.cat(*args, **kwargs)
File "/path/yolov5QuantBrevitas/lib/python3.8/site-packages/brevitas/quant_tensor/__init__.py", line 273, in cat
first_qt.check_scaling_factors_same(qt)
File "/path/yolov5QuantBrevitas/lib/python3.8/site-packages/brevitas/quant_tensor/__init__.py", line 207, in check_scaling_factors_same
raise RuntimeError("Scaling factors are different")
RuntimeError: Scaling factors are different
I think the problem maybe involves the concatenation of the Head part in Yolov5. This error is really annoying because it happens at random epochs, sometimes on the 17th epoch or on the 1st epoch. Have you ever encountered this problem?
The quantitative method is implemented in the way of this website https://github.com/sefaburakokcu/quantized-yolov5. I don't know that have you reference this example or not. If you do, can you please give me some idea to eliminate the error by this reference example. If you don't, just tell me how do you solve it. I appreciate that your assistance!
Hi @TCGoingW ,
Sorry for my late response, when you use quant layers, try with "return_quant_tensor=False", I think it will solve the problem.
I am also interested with your results (mainly mAP values ). We can discuss this more in private if you want.
You can contact me here : ismailosse1999@gmail.com Discord : d.tox
Edit : I used the same implemented method by the way
Hi @IsmailAM1999, @TCGoingW, I'm currently exploring the possibility of quantizing YOLOv5 or newer YOLO model using Brevitas and came across this discussion. I wanted to know you was able to successfully quantize a YOLO model and, if so, could you share any insights or code snippets that might help me in the same endeavor? Your experiences and knowledge would be greatly appreciated. Thank you for your time and support.
Dear readers,
Thank you for your hard work and for providing such an interesting library.
Actually, I am working on quantization, especially on the YOLOv7 module. I made a small change in the 'Conv' class in common.py. I replaced Conv2d with QuantConv2d without changing any parameters. Here is the modified code:
However, when I try to train the model, I encounter the following error:
Has anyone experienced this kind of pickle error in YOLOv7 or any other object detection model?
Thank you very much for your time and your interest in my request.