Open awarebayes opened 2 years ago
testing it with the master branch
Any news on this? Wondering if this has to do with cuda memory errors that people have been seeing. https://github.com/WongKinYiu/yolov7/issues/865 as one example.
change :
# Read until video is completed
while(cap.isOpened()):
# Capture frame-by-frame
to
# Read until video is completed
with torch.no_grad():
while(cap.isOpened()):
# Capture frame-by-frame
and try removing the " with torch.no_grad():"
from
with torch.no_grad():
output = output_to_keypoint(output)
I used to have my 2080ti memory usage maxed out and now it doesn't go above 4GB while inferring. Hope this helps you.
The code that leads to the leak can be found in general.py line 628.
def non_max_suppression(prediction, conf_thres=0.1, iou_thres=0.45, classes=None, agnostic=False, multi_label=False,
labels=()):
"""Runs Non-Maximum Suppression (NMS) on inference results
Returns:
list of detections, on (n,6) tensor per image [xyxy, conf, cls]
"""
nc = prediction.shape[2] - 5 # number of classes
xc = prediction[..., 4] > conf_thres # candidates
# Settings
min_wh, max_wh = 2, 4096 # (pixels) minimum and maximum box width and height
max_det = 300 # maximum number of detections per image
max_nms = 30000 # maximum number of boxes into torchvision.ops.nms()
time_limit = 10.0 # seconds to quit after
redundant = True # require redundant detections
multi_label &= nc > 1 # multiple labels per box (adds 0.5ms/img)
merge = False # use merge-NMS
t = time.time()
#The line below leads to memory leak
output = [torch.zeros((0, 6), device=prediction.device)] * prediction.shape[0]
for xi, x in enumerate(prediction): # image index, image inference
# Apply constraints
# x[((x[..., 2:4] < min_wh) | (x[..., 2:4] > max_wh)).any(1), 4] = 0 # width-height
x = x[xc[xi]] # confidence
# Cat apriori labels if autolabelling
if labels and len(labels[xi]):
l = labels[xi]
v = torch.zeros((len(l), nc + 5), device=x.device)
v[:, :4] = l[:, 1:5] # box
v[:, 4] = 1.0 # conf
v[range(len(l)), l[:, 0].long() + 5] = 1.0 # cls
x = torch.cat((x, v), 0)
# If none remain process next image
if not x.shape[0]:
continue
# Compute conf
if nc == 1:
x[:, 5:] = x[:, 4:5] # for models with one class, cls_loss is 0 and cls_conf is always 0.5,
# so there is no need to multiplicate.
else:
x[:, 5:] *= x[:, 4:5] # conf = obj_conf * cls_conf
# Box (center x, center y, width, height) to (x1, y1, x2, y2)
box = xywh2xyxy(x[:, :4])
# Detections matrix nx6 (xyxy, conf, cls)
if multi_label:
i, j = (x[:, 5:] > conf_thres).nonzero(as_tuple=False).T
x = torch.cat((box[i], x[i, j + 5, None], j[:, None].float()), 1)
else: # best class only
conf, j = x[:, 5:].max(1, keepdim=True)
x = torch.cat((box, conf, j.float()), 1)[conf.view(-1) > conf_thres]
# Filter by class
if classes is not None:
x = x[(x[:, 5:6] == torch.tensor(classes, device=x.device)).any(1)]
# Apply finite constraint
# if not torch.isfinite(x).all():
# x = x[torch.isfinite(x).all(1)]
# Check shape
n = x.shape[0] # number of boxes
if not n: # no boxes
continue
elif n > max_nms: # excess boxes
x = x[x[:, 4].argsort(descending=True)[:max_nms]] # sort by confidence
# Batched NMS
c = x[:, 5:6] * (0 if agnostic else max_wh) # classes
boxes, scores = x[:, :4] + c, x[:, 4] # boxes (offset by class), scores
i = torchvision.ops.nms(boxes, scores, iou_thres) # NMS
if i.shape[0] > max_det: # limit detections
i = i[:max_det]
if merge and (1 < n < 3E3): # Merge NMS (boxes merged using weighted mean)
# update boxes as boxes(i,4) = weights(i,n) * boxes(n,4)
iou = box_iou(boxes[i], boxes) > iou_thres # iou matrix
weights = iou * scores[None] # box weights
x[i, :4] = torch.mm(weights, x[:, :4]).float() / weights.sum(1, keepdim=True) # merged boxes
if redundant:
i = i[iou.sum(1) > 1] # require redundancy
output[xi] = x[i]
if (time.time() - t) > time_limit:
print(f'WARNING: NMS time limit {time_limit}s exceeded')
break # time limit exceeded
return output
Thanks @StefanCiobanu1989 for the solution! May I know why output = [torch.zeros((0, 6), device=prediction.device)] * prediction.shape[0] leads to memory leak? Currently, I easily get OOM during training and curious if this is related.
I'm having memory leaks with 10 images 640 pixel wide on a 16GB M1 computer.
python train.py --weights yolob7.py --data "data/custom.yaml" --workers 4 --batch-size 4 --img 4096 --cfg cfg/training/yolov7.yaml --name yolov7 --hyp data/hyp.scratch.p5.yaml
So it starts to process, create the init.pt
file and after some seconds..
[1] 22575 killed python3 train.py --weights yolob7.py --data "data/custom.yaml" --workers 4 4
/Users/tiagogouvea/anaconda3/envs/py310/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 41 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
I saw the solution proposed by @StefanCiobanu1989 but, sorry, I cant find how to fix it. I cant find the while(cap.isOpened()):
code, and on the "with torch.no_grad():" I changed it but still having the error.
The _"with torch.nograd():" statement is used in PyTorch to temporarily disable gradient calculation. This is particularly useful when you're performing inference and can lead to faster and more memory-efficient computations.
Try the following block of code
import cv2
import torch
# Load your trained model
model = ... # Load your PyTorch model
# Set the model to evaluation mode
model.eval()
# Open the video capture
video_path = 'path_to_your_video.mp4'
cap = cv2.VideoCapture(video_path)
with torch.no_grad():
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
# Preprocess the frame if needed
# Convert the frame to a tensor (assuming you have a suitable function for this)
frame_tensor = ... # Convert the frame to a PyTorch tensor
# Perform inference using the model
output = model(frame_tensor)
# Process the output if needed
# Display or save the processed frame
# Press 'q' to exit
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# Release the video capture and close the windows
cap.release()
cv2.destroyAllWindows()
This code contains a memory leak.
I tried looking at memory usage with nvidia-smi, and with each loop pass this function consumes more and more memory without it being free'd.
Just try running some keypoints on a video.
I try running the following code: