Open fangli333 opened 5 months ago
Hi, you do not need to retrain the model to run inference on different videos. You can use the provided checkpoints directly.
To run inference, you first need to instantiate DOT, e.g., for the default config:
from dot.models.dense_optical_tracking import DenseOpticalTracker as DOT
model = DOT().cuda()
The supported inference modes are listed here: https://github.com/16lemoing/dot/blob/0c26a74cab68ef361c6cabbbaad712685f79afad/dot/models/dense_optical_tracking.py#L27-L33
If you tell me more about which output you would like to obtain I can help you find the inference mode that is best suited to your use case, maybe we need to create a new one.
You can also look at the demo to have an example of inference.
thank you for your reply. your answer really helps a lot.
Hi,
i would like to know if i wanna process a video, how should I do? I wanna obtain the tracking points between each two consecutive frames.
thx
Hi, can you be more specific? If I understand well, you would like to use DOT to extract the optical flow between consecutive frames, is that correct? (like the function "get_flow_from_last_to_first_frame" https://github.com/16lemoing/dot/blob/0c26a74cab68ef361c6cabbbaad712685f79afad/dot/models/dense_optical_tracking.py#L37 but for arbitrary source and target frames, e.g., consecutive ones) I can implement a generic function that does that if you want.
Hi @16lemoing, can you elaborate more on these modes? https://github.com/16lemoing/dot/blob/550e00336198dc143493e415d52720eb9a53ab55/dot/models/dense_optical_tracking.py#L27-L33
Let's say if I want to produce tracks that cover all pixels in a given video, how do I leverage your API? Thanks!
Hi @ernestchu, I have added a new mode that does just that. It produces tracks that cover every cell in a video.
You can set the size (in pixels) of the cells with the flag --cell_size
(by default 1: every pixel), and the number of time steps between cells --cell_time_steps
(by default 20 to speed up inference).
I have also updated the demo, you can now visualize an overlay with tracks reinitialized every few frames.
https://github.com/16lemoing/dot/assets/32103788/c0581f9a-0508-423b-b4db-9ec1aca2320a
python demo.py --visualization_modes overlay --video_path cartwheel.mp4 --inference_mode tracks_from_every_cell_in_every_frame --cell_time_steps 20
Hi @16lemoing , I would expect .round()
is better than .floor()
here. What do you think? :)
Hi, I agree this is a bit misleading. But floor() gives the correct behavior, here is an illustration:
Hi @16lemoing, could you please explain how to track pixels from an input mask through the whole video using your code? I would like to use one of my own videos. Thank you.
Hi @maxboels, if you have a mask on the first frame, you can use this inference mode:
https://github.com/16lemoing/dot/blob/e32c6f7de12342460371f2efc4789bd79c4a39a3/dot/models/dense_optical_tracking.py#L32-L33
This will return a dictionary from which you can extract "tracks" with the shape [B T H W 3]
(with B
: batch size, T
: time steps, H
: height, W
: width) corresponding to all the pixels in the first frame.
You can then filter the tracks to only keep those which belong to the mask.
Hi @16lemoing , just want to check it with you.
I want every pixel penetrated by at least one track (after round it or floor it).
Somehow, if I replace these two lines https://github.com/16lemoing/dot/blob/e32c6f7de12342460371f2efc4789bd79c4a39a3/dot/models/dense_optical_tracking.py#L228-L229 with
visited_x = (visited_x * ((cw - 1) / (w - 1))).round().long()
visited_y = (visited_y * ((ch - 1) / (h - 1))).round().long()
then it gets me what I want, with rounded tracks.
Note: I set
cw = w
,ch = h
andct=1
in order to deal with all pixels In addition, I think the variable names are misleading becausecw
is actually the number of cells inw
, instead of "cell width".ct
has the opposite meaning though
hi,
if i wanna obtain the results for different videos, do i need to run training for each video, or use the ckpt you you provided for inference. if so, how to inference, thank you
thanks,