ayooshkathuria / YOLO_v3_tutorial_from_scratch

Accompanying code for Paperspace tutorial series "How to Implement YOLO v3 Object Detector from Scratch"
https://blog.paperspace.com/how-to-implement-a-yolo-object-detector-in-pytorch/
2.32k stars 725 forks source link

Save detected video #28

Open AvivSham opened 6 years ago

AvivSham commented 6 years ago

How can I save the detected video? I am using video_detect.append(frame) instead of cv2.imshow("frame",frame) to collect the detected frames. I tried to combine them to video using the following code:

video_detection_save = cv2.VideoWriter('video.avi',-1,20,(416,416)) for j in range(frames): video_detection_save.write(video_detect[j])

cv2.destroyAllWindows() video_detection_save.release()

This is my loop over the frames: ` while cap.isOpened(): ret, frame = cap.read()

  if ret:   
      img = prep_image(frame, inp_dim)
      '''cv2.imshow("a", frame)'''
      im_dim = frame.shape[1], frame.shape[0]
      im_dim = torch.FloatTensor(im_dim).repeat(1,2)   

      if CUDA:
          im_dim = im_dim.cuda()
          img = img.cuda()

      with torch.no_grad():
          output = model(Variable(img, volatile = True), CUDA)
      output = write_results(output, confidence, num_classes, nms_conf = nms_thesh)

      if type(output) == int:
          frames += 1
          #cv2.imshow("frame", frame)
          video_detect.append(frame)
          key = cv2.waitKey(1)
          if key & 0xFF == ord('q'):
              break
          continue

      im_dim = im_dim.repeat(output.size(0), 1)
      scaling_factor = torch.min(416/im_dim,1)[0].view(-1,1)

      output[:,[1,3]] -= (inp_dim - scaling_factor*im_dim[:,0].view(-1,1))/2
      output[:,[2,4]] -= (inp_dim - scaling_factor*im_dim[:,1].view(-1,1))/2

      output[:,1:5] /= scaling_factor

      for i in range(output.shape[0]):
          output[i, [1,3]] = torch.clamp(output[i, [1,3]], 0.0, im_dim[i,0])
          output[i, [2,4]] = torch.clamp(output[i, [2,4]], 0.0, im_dim[i,1])

      classes = load_classes('data/coco.names')
      colors = pkl.load(open("pallete", "rb"))

      list(map(lambda x: write(x, frame), output))

      video_detect.append(frame)
      key = cv2.waitKey(1)
      if key & 0xFF == ord('q'):
          FPS = frames // (time.time() - start)
          break
      frames += 1

  else:
      FPS = frames // (time.time() - start)
      break`

Please help me to save the problem.

5kejun commented 5 years ago

that is my modified code about video saving, just several lines, by the way ,you need to buide a new folder named 'video' and put into you test video right here. thanks for the wonderful work to the author @ayooshkathuria

videofile = args.videofile # or path to the video file.

cap = cv2.VideoCapture(videofile)# cap = cv2.VideoCapture(0)

fps = 20# my adding about video saving, start sz = (int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)), int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))) fourcc = cv2.VideoWriter_fourcc(*'DIVX') vout = cv2.VideoWriter() vout.open('./video/output.avi', fourcc, fps, sz, True)# my adding about video saving, end

assert cap.isOpened(), 'Cannot capture source'

frames = 0
start = time.time()

while cap.isOpened(): ret, frame = cap.read()

if ret:   
    img = prep_image(frame, inp_dim)
    im_dim = frame.shape[1], frame.shape[0]
    im_dim = torch.FloatTensor(im_dim).repeat(1,2)   

    if CUDA:
        im_dim = im_dim.cuda()
        img = img.cuda()

    with torch.no_grad():
        output = model(Variable(img, volatile = True), CUDA)
    output = write_results(output, confidence, num_classes, nms_conf = nms_thesh)

    if type(output) == int:
        frames += 1
        print("FPS of the video is {:5.4f}".format( frames / (time.time() - start)))
        cv2.imshow("frame", frame)
        key = cv2.waitKey(1)
        if key & 0xFF == ord('q'):
            break
        continue

    im_dim = im_dim.repeat(output.size(0), 1)
    scaling_factor = torch.min(416/im_dim,1)[0].view(-1,1)

    output[:,[1,3]] -= (inp_dim - scaling_factor*im_dim[:,0].view(-1,1))/2
    output[:,[2,4]] -= (inp_dim - scaling_factor*im_dim[:,1].view(-1,1))/2

    output[:,1:5] /= scaling_factor

    for i in range(output.shape[0]):
        output[i, [1,3]] = torch.clamp(output[i, [1,3]], 0.0, im_dim[i,0])
        output[i, [2,4]] = torch.clamp(output[i, [2,4]], 0.0, im_dim[i,1])

    classes = load_classes('data/coco.names')
    colors = pkl.load(open("pallete", "rb"))
    list(map(lambda x: write(x, frame), output))
    cv2.imshow("frame", frame)

    vout.write(frame)# my adding about  video saving

    key = cv2.waitKey(1)
    if key & 0xFF == ord('q'):
        break
    frames += 1
    print(time.time() - start)
    print("FPS of the video is {:5.2f}".format( frames / (time.time() - start)))
else:
    break 
AvivSham commented 5 years ago

In order to know the video's FPS you can use: cap.get(cv2.CAP_PROP_FPS)

AvivSham commented 5 years ago

Still after running your modification, the saved photo is without the predicted bounding boxes.

AvivSham commented 5 years ago

This is my code to detect video, can you help me please? When Im using your code my kernel dies for some reason (I'm using colab to run it). I also see that the saved video has a different length comparing to the input video.

` #Detection phase

Use cv2.VideoCapture('your video file name or path')

cap = cv2.VideoCapture('Dedication - Short Film.mp4') #cap = cv.VideoCapture(0)

fps = int(cap.get(cv2.CAP_PROP_FPS)) sz = (int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)), int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))) fourcc = cv2.VideoWriter_fourcc(*'DIVX') vout = cv2.VideoWriter() vout.open('./output_video.avi',fourcc,fps,sz,True)

assert cap.isOpened(), 'Cannot capture source'

frames = 0 start = time.time()

while cap.isOpened(): ret, frame = cap.read()

  if ret:   
      img = prep_image(frame, inp_dim)
      im_dim = frame.shape[1], frame.shape[0]
      im_dim = torch.FloatTensor(im_dim).repeat(1,2)   

      if CUDA:
          im_dim = im_dim.cuda()
          img = img.cuda()

      with torch.no_grad():
          output = model(Variable(img, volatile = True), CUDA)
      output = write_results(output, confidence, num_classes, nms_conf = nms_thesh)

      if type(output) == int:
          frames += 1
          key = cv2.waitKey(1)
          if key & 0xFF == ord('q'):
              break
          continue

      im_dim = im_dim.repeat(output.size(0), 1)
      scaling_factor = torch.min(416/im_dim,1)[0].view(-1,1)

      output[:,[1,3]] -= (inp_dim - scaling_factor*im_dim[:,0].view(-1,1))/2
      output[:,[2,4]] -= (inp_dim - scaling_factor*im_dim[:,1].view(-1,1))/2

      output[:,1:5] /= scaling_factor

      for i in range(output.shape[0]):
          output[i, [1,3]] = torch.clamp(output[i, [1,3]], 0.0, im_dim[i,0])
          output[i, [2,4]] = torch.clamp(output[i, [2,4]], 0.0, im_dim[i,1])

      classes = load_classes('data/coco.names')
      colors = pkl.load(open("pallete", "rb"))

      list(map(lambda x: write_box(x, frame), output))

      vout.write(frame)

      key = cv2.waitKey(1)
      if key & 0xFF == ord('q'):
          FPS = frames // (time.time() - start)
          break
      frames += 1

  else:
      FPS = frames // (time.time() - start)
      break`
5kejun commented 5 years ago

@AvivSham Actually, I get the totally right video output with predicted boxes by simply revised list(map(lambda x: write_box(x, frame), output))
to list(map(lambda x: write(x, frame), output)). On the other hand, I think you may get wrong tab in your code. unlike C++, python is sensitive to tab or space. so I suggest you look back on the author's recent update to confirm there are no tab errors.

AvivSham commented 5 years ago

I have changed the problematic line and also defined the write function but still the output video remains the same (without detections). Can you help me please? @5kejun

See the attached wirte function, the whole loop is attached above but I have changed the line. this is the "write" function:

def write(x, results): c1 = tuple(x[1:3].int()) c2 = tuple(x[3:5].int()) img = results[int(x[0])] cls = int(x[-1]) color = random.choice(colors) label = "{0}".format(classes[cls]) cv2.rectangle(img, c1, c2,color, 1) t_size = cv2.getTextSize(label, cv2.FONT_HERSHEY_PLAIN, 1 , 1)[0] c2 = c1[0] + t_size[0] + 3, c1[1] + t_size[1] + 4 cv2.rectangle(img, c1, c2,color, -1) cv2.putText(img, label, (c1[0], c1[1] + t_size[1] + 4), cv2.FONT_HERSHEY_PLAIN, 1, [225,255,255], 1); return img

5kejun commented 5 years ago

@AvivSham I don't know your purpose of redefined the 'write' fuction. But, I find that your defined 'write' fuction equal to the 'write' fuction in the 'detect.py' which is used to detect a image folder. The line 'img = results[int(x[0])]' denotes you are loading many images at once, so that ‘list(map(lambda x: write(x, loaded_ims), output))’ in the 'detect.py' is not necessarily in any loop. Meanwhile, 'list(map(lambda x: write(x, frame), output))' in 'video.py' must be in the loop for processing video frames one by one. That is to say, your redefined 'write' function is not suit for video processing. Read it may help. https://blog.paperspace.com/how-to-implement-a-yolo-v3-object-detector-from-scratch-in-pytorch-part-5/

AvivSham commented 5 years ago

Hi @5kejun, Can you send your mail? or reach me at Mista2311@gmail.com

AvivSham commented 5 years ago

@5kejun Thank you for helping the problem was elsewhere.