16lemoing / dot

Dense Optical Tracking: Connecting the Dots
https://16lemoing.github.io/dot
MIT License
217 stars 11 forks source link

inference #6

Open fangli333 opened 5 months ago

fangli333 commented 5 months ago

hi,

if i wanna obtain the results for different videos, do i need to run training for each video, or use the ckpt you you provided for inference. if so, how to inference, thank you

thanks,

16lemoing commented 5 months ago

Hi, you do not need to retrain the model to run inference on different videos. You can use the provided checkpoints directly.

To run inference, you first need to instantiate DOT, e.g., for the default config:

from dot.models.dense_optical_tracking import DenseOpticalTracker as DOT
model = DOT().cuda()

The supported inference modes are listed here: https://github.com/16lemoing/dot/blob/0c26a74cab68ef361c6cabbbaad712685f79afad/dot/models/dense_optical_tracking.py#L27-L33

If you tell me more about which output you would like to obtain I can help you find the inference mode that is best suited to your use case, maybe we need to create a new one.

You can also look at the demo to have an example of inference.

fangli333 commented 5 months ago

thank you for your reply. your answer really helps a lot.

fangli333 commented 5 months ago

Hi,

i would like to know if i wanna process a video, how should I do? I wanna obtain the tracking points between each two consecutive frames.

thx

16lemoing commented 5 months ago

Hi, can you be more specific? If I understand well, you would like to use DOT to extract the optical flow between consecutive frames, is that correct? (like the function "get_flow_from_last_to_first_frame" https://github.com/16lemoing/dot/blob/0c26a74cab68ef361c6cabbbaad712685f79afad/dot/models/dense_optical_tracking.py#L37 but for arbitrary source and target frames, e.g., consecutive ones) I can implement a generic function that does that if you want.

ernestchu commented 5 months ago

Hi @16lemoing, can you elaborate more on these modes? https://github.com/16lemoing/dot/blob/550e00336198dc143493e415d52720eb9a53ab55/dot/models/dense_optical_tracking.py#L27-L33

Let's say if I want to produce tracks that cover all pixels in a given video, how do I leverage your API? Thanks!

16lemoing commented 5 months ago

Hi @ernestchu, I have added a new mode that does just that. It produces tracks that cover every cell in a video.

https://github.com/16lemoing/dot/blob/e32c6f7de12342460371f2efc4789bd79c4a39a3/dot/models/dense_optical_tracking.py#L34-L35

You can set the size (in pixels) of the cells with the flag --cell_size (by default 1: every pixel), and the number of time steps between cells --cell_time_steps (by default 20 to speed up inference).

I have also updated the demo, you can now visualize an overlay with tracks reinitialized every few frames.

https://github.com/16lemoing/dot/assets/32103788/c0581f9a-0508-423b-b4db-9ec1aca2320a

python demo.py --visualization_modes overlay --video_path cartwheel.mp4 --inference_mode tracks_from_every_cell_in_every_frame --cell_time_steps 20
ernestchu commented 5 months ago

Hi @16lemoing , I would expect .round() is better than .floor() here. What do you think? :)

https://github.com/16lemoing/dot/blob/e32c6f7de12342460371f2efc4789bd79c4a39a3/dot/models/dense_optical_tracking.py#L228-L229

16lemoing commented 5 months ago

Hi, I agree this is a bit misleading. But floor() gives the correct behavior, here is an illustration:

floor

maxboels commented 5 months ago

Hi @16lemoing, could you please explain how to track pixels from an input mask through the whole video using your code? I would like to use one of my own videos. Thank you.

16lemoing commented 5 months ago

Hi @maxboels, if you have a mask on the first frame, you can use this inference mode: https://github.com/16lemoing/dot/blob/e32c6f7de12342460371f2efc4789bd79c4a39a3/dot/models/dense_optical_tracking.py#L32-L33 This will return a dictionary from which you can extract "tracks" with the shape [B T H W 3] (with B: batch size, T: time steps, H: height, W: width) corresponding to all the pixels in the first frame.

You can then filter the tracks to only keep those which belong to the mask.

ernestchu commented 4 months ago

Hi @16lemoing , just want to check it with you.

I want every pixel penetrated by at least one track (after round it or floor it).

Somehow, if I replace these two lines https://github.com/16lemoing/dot/blob/e32c6f7de12342460371f2efc4789bd79c4a39a3/dot/models/dense_optical_tracking.py#L228-L229 with

visited_x = (visited_x * ((cw - 1) / (w - 1))).round().long()
visited_y = (visited_y * ((ch - 1) / (h - 1))).round().long()

then it gets me what I want, with rounded tracks.

Note: I set cw = w , ch = h and ct=1 in order to deal with all pixels In addition, I think the variable names are misleading because cw is actually the number of cells in w, instead of "cell width". ct has the opposite meaning though