Inference time Multi GPUs for long video out of memory issue.

feiwu77777 commented 6 months ago

Hello, thanks a lot for sharing your work!

I used co-tracker for videos of up to 3min to track a segmentation mask, and the GPU from Colab ran out of memory. I understand I can reduce the grid size for the memory issue, but I wanted to know if it's possible to run the inference on multiple GPUs at the same time in parallel?

I'm not sure it would work as I would cut the video into 3 segments of 1min each and give the 3 segments to 3 GPUs for example. But I guess the tracking algorithm needs the whole video to work?

I saw in the colab notebook that you can track manually selected points, for my problem can I apply co-tracker on the first 1min of the video, save the tracked points of the last frame, and start tracking those points from minute 1 to minute 2?

nikitakaraevv commented 6 months ago

Hi @feiwu77777, I think there's now a better solution to your problem. You can run CoTracker in online mode. This way, you don't have to keep the whole video in memory, only the current sliding window and estimated tracks. Please see https://github.com/facebookresearch/co-tracker?tab=readme-ov-file#online-mode and check out the online demo. Let me know if this helps!

JamesSand commented 4 months ago

Hi @feiwu77777, I think there's now a better solution to your problem. You can run CoTracker in online mode. This way, you don't have to keep the whole video in memory, only the current sliding window and estimated tracks. Please see https://github.com/facebookresearch/co-tracker?tab=readme-ov-file#online-mode and check out the online demo. Let me know if this helps!

Thank you, that works for me!

pvtoan commented 2 months ago

Hi,

Thank you so much for sharing your interesting work!

At present, I am trying to understand about online_demo code.

In particular, the variable "window_frames" will be appended continuously as long as new frames are fed. Therefore, if I need to track points for several hours/days, "window_frames" will be very huge and might cause Out-of-memory.

In this case, is it possible if I discard previous frames and keep the number of frames in "window_frames" to be fixed, and the tracking accuracy is still OK?

nikitakaraevv commented 2 months ago

Hi @pvtoan, this variable exists for one reason - to make it possible to visualize the output using the existing tools in this repository. You can safely discard the previous frames.

pvtoan commented 2 months ago

Hi @nikitakaraevv,

Thank you for your clear answer.

Btw, when I already "discarded" previous frames of variable "window_frames" (it means the input of function "CoTrackerOnlinePredictor" has a fixed size)

However, the output including "pred_tracks" and "pred_visibility" are still appended continuously.

Could you please let me know which commands cause appending continuously for these two variables? Also, is it safe too if I discard previous frames for these two variables?
Next, could you please roughly explain about the meaning of the below command when returning values for these two variables in the last command of function "CoTrackerOnlinePredictor" (I just rewrite them for an easier view)?

pred_tracks = tracks * tracks.new_tensor([(W - 1) / (self.interp_shape[1] - 1), (H - 1) / (self.interp_shape[0] - 1),])

nikitakaraevv commented 2 months ago

Hi @pvtoan, yes, this is done for the same reason, and we should probably fix it.

Here we continuously update tracks an visibilities inside the model in the online mode: https://github.com/facebookresearch/co-tracker/blob/0f9d32869ac51f3bd12c5ead9c206366cfb6caea/cotracker/models/core/cotracker/cotracker.py#L345 The fix is to do something like:
```
self.online_coords_predicted = coords_predicted[:, ind : ind + S]
self.online_vis_predicted = vis_predicted[:, ind : ind + S]
```
and make sure it doesn't break anything.
We resize each input video to the model resolution, and tracks are predicted in this resolution. This line transforms the predictions back to the original resolution.

pvtoan commented 2 months ago

Hi @nikitakaraevv ,

I tried to update with the two commands in "cotracker.py", as you showed self.online_coords_predicted = coords_predicted[:, ind : ind + S] self.online_vis_predicted = vis_predicted[:, ind : ind + S]

However, I got the following errors Traceback (most recent call last): File "./CoTracker/codes/cotracker/testcode.py", line 207, in pred_tracks, pred_visibility = _process_step_points(window_frames, is_first_step, queries = queries) File "./CoTracker/codes/cotracker/testcode.py", line 165, in _process_step_points return cotracker(video_chunk, is_first_step=is_first_step, queries=queries[None], add_support_grid=True) File "./lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "./lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "./CoTracker/codes/cotracker/cotracker/predictor.py", line 256, in forward tracks, visibilities, __ = self.model(video=video_chunk, queries=self.queries, iters=6, is_online=True,) File "./lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "./CoTracker/codes/cotracker/cotracker/models/core/cotracker/cotracker.py", line 335, in forward coords_predicted[:, ind : ind + S] = coords[-1][:, :S_trimmed] RuntimeError: The expanded size of the tensor (4) must match the existing size (8) at non-singleton dimension 1. Target sizes: [1, 4, 63, 2]. Tensor sizes: [8, 63, 2]

Could you please help me fix this issue?

pvtoan commented 2 months ago

Hi @nikitakaraevv ,

If you have time, please take a look at my issue.

Thank you for your help!

nikitakaraevv commented 2 months ago

Hi @pvtoan, this will take some time, I'll try to take a look over the weekend!

facebookresearch / co-tracker

Inference time Multi GPUs for long video out of memory issue. #51