facebookresearch / co-tracker

CoTracker is a model for tracking any point (pixel) on a video.
https://co-tracker.github.io/
Other
2.52k stars 177 forks source link

how to draw trajectory of predicted points #23

Closed Shengnan-Zhu closed 10 months ago

Shengnan-Zhu commented 10 months ago

Hi, @nikitakaraevv, thanks for your excellent work! I want to know how to draw the trajectory of tracked points on graph in the README. I've tried set tracks_leave_trace=-1 in Visualizer, but the result looks a little different from the one in the README. Greatly appreciate it If you could show some commands or any help!

nikitakaraevv commented 10 months ago

Hi @Shengnan-Zhu, that's how you can do it:

vis = Visualizer(
    mode="rainbow",
    tracks_leave_trace=-1,
)
vis.visualize(
    video,
    pred_tracks,
    segm_mask=sample.segmentation,
    compensate_for_camera_motion=True
)

You need to pass a segmentation mask for the first frame of the video, and set compensate_for_camera_motion=True.

Shengnan-Zhu commented 10 months ago

Thanks for your reply! I tried your suggestion, and I pass the segm_mask(used in CoTrackerPredictor model) to 'segm_mask' in visualize(), then I encountered a problem of indexing between different devices(cpu or gpu). And I moved segm_mask to cuda, but then the problem was: "can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.". Hope more help, thanks!

here's part of sample code:

segm_mask = np.array(Image.open(os.path.join(args.mask_path)))
segm_mask = torch.from_numpy(segm_mask)[None, None]
assert video.shape[3:5] == segm_mask.shape[2:4], "Video and mask dimensions do not match"
args.output_name = args.output_name + "_mask"

pred_tracks, pred_visibility = model(
       video,
       grid_size=args.grid_size,
       grid_query_frame=args.grid_query_frame,
       backward_tracking=args.backward_tracking,
       segm_mask=segm_mask,
  )
print("computed")

# save a video with predicted tracks
seq_name = args.video_path.split("/")[-1]
vis = Visualizer(save_dir="./saved_videos", pad_value=120, linewidth=2, tracks_leave_trace=-1)
vis.visualize(video, pred_tracks, pred_visibility, 
                      query_frame=args.grid_query_frame, 
                      filename=args.output_name, 
                      compensate_for_camera_motion=True,
                      segm_mask=segm_mask.cuda())

Hi @Shengnan-Zhu, that's how you can do it:

vis = Visualizer(
    mode="rainbow",
    tracks_leave_trace=-1,
)
vis.visualize(
    video,
    pred_tracks,
    segm_mask=sample.segmentation,
    compensate_for_camera_motion=True
)

You need to pass a segmentation mask for the first frame of the video, and set compensate_for_camera_motion=True.

nikitakaraevv commented 10 months ago

Can you try to do video.cpu(), pred_tracks.cpu() and segm_mask.cpu() before passing them tovis.visualize?

Shengnan-Zhu commented 10 months ago

Can you try to do video.cpu(), pred_tracks.cpu() and segm_mask.cpu() before passing them tovis.visualize?

Yes, I tried video.cpu(), pred_tracks.cpu() , pred_visibility.cpu() and segm_mask.cpu(), but still have some error. here's the error log:

Traceback (most recent call last):
  File "/home/shengnan/co-tracker/demo.py", line 119, in <module>
    vis.visualize(video.cpu(), pred_tracks.cpu(), pred_visibility.cpu(),
  File "/home/shengnan/co-tracker/cotracker/utils/visualizer.py", line 94, in visualize
    res_video = self.draw_tracks_on_video(
  File "/home/shengnan/co-tracker/cotracker/utils/visualizer.py", line 215, in draw_tracks_on_video
    res_video[t] = self._draw_pred_tracks(
  File "/home/shengnan/co-tracker/cotracker/utils/visualizer.py", line 269, in _draw_pred_tracks
    coord_y = (int(tracks[s, i, 0]), int(tracks[s, i, 1]))
ValueError: cannot convert float NaN to integer

and my code:

vis.visualize(video.cpu(), pred_tracks.cpu(), pred_visibility.cpu(), 
                      query_frame=args.grid_query_frame, 
                      filename=args.output_name, 
                      compensate_for_camera_motion=True,
                      segm_mask=segm_mask.cpu())

then I tried to replaceNaN in tracks to 0, and the demo runs normal, but the result video looks strange.

https://github.com/facebookresearch/co-tracker/assets/93901590/8d97dcb0-ff04-42f0-9303-09310ed0edbe

nikitakaraevv commented 10 months ago

What is your grid_query_frame?

Shengnan-Zhu commented 10 months ago

For some reason, you have NaNs in the estimated tracks. Could you send me this video?

Sure, here is the video and mask: bmx-bumps.zip

Shengnan-Zhu commented 10 months ago

I didn't set grid_query_frame, maybe its a default value 0. And the mask file is for the first frame of the video

nikitakaraevv commented 10 months ago

I think the reason for this is that you're taking segm_mask as input to the model. In this case, the model only estimates the motion of the object points and can't rely on any background points to compensate for camera motion.

That's the result I get when running this command with your video and mask: python demo.py --grid_size 30 --video_path ./bmx-bumps.mp4 --mask_path ./mask.png

This is the code:

    pred_tracks, pred_visibility = model(
        video,
        grid_size=args.grid_size,
        grid_query_frame=args.grid_query_frame,
        backward_tracking=args.backward_tracking,
    )
    print("computed")

    # save a video with predicted tracks
    seq_name = args.video_path.split("/")[-1]
    vis = Visualizer(
        save_dir="./saved_videos",
        pad_value=120,
        linewidth=3,
        tracks_leave_trace=-1,
    )

    vis.visualize(
        video,
        pred_tracks.cpu(),
        query_frame=args.grid_query_frame,
        segm_mask=segm_mask,
        compensate_for_camera_motion=True,
    )

https://github.com/facebookresearch/co-tracker/assets/37815420/860494fb-6fb1-4fbc-a494-e0d2df6fa644

Shengnan-Zhu commented 10 months ago

Yes, you are right! And this time the result looks correct. Thanks very much for your help! I will close this issue.