AssafSinger94 / dino-tracker

Official Pytorch Implementation for “DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video”
MIT License
361 stars 39 forks source link

How to train dino-tracker on customized videos? #6

Closed willipwk closed 3 months ago

willipwk commented 4 months ago

Thanks for your great work!

I'm trying to train dino-tracker on my customized video. The original resolution is $640\times 480$. I also modified the video_resh and video_resw in config files. The input file looks like this:

00000 00000

But the output looks weird.

https://github.com/AssafSinger94/dino-tracker/assets/58909139/991cfed1-3a15-44bd-b95b-10713b143c68

Then I guess maybe the resolution should be divisible by 14 since DINO will create $14\times 14$ image patches. So I slightly cut and zero pad the original input into $644\times 476$ and modify the config file. But the output is not correct either. The dots do not match the shape of the tie exactly.

https://github.com/AssafSinger94/dino-tracker/assets/58909139/571a71ff-407d-492b-973c-208e126e978a

Finally I check the input resolution of the horsejump example. The resolution is $854\times 480$. So again I cut and zero pad the original images into this size and finally get the correct result.

https://github.com/AssafSinger94/dino-tracker/assets/58909139/0569fba0-eb1a-4502-a362-7e723399920b

So I wonder how to train dino-tracker on videos with resolution other than $854\times 480$? Or does the current implementation only support this input resolution?

tnarek commented 4 months ago

hi @willipwk, thanks for your question. The way it is currently implemented, DINO-Tracker can be trained on a video of any resolution, but the video is being resized to the standard 854x480 resolution.

AtaAtasoy commented 4 months ago

Hi, I also had a similar problem when training with custom video resolutions and the visualization looked off. Then I realized that I had to modify these lines in visualize_rainbow.py

parser.add_argument("--infer-res-size", type=int, nargs=2, default=(476, 854), help="Inference resolution size, (h, w). --NOTE-- change according to values in train.yaml.")

parser.add_argument("--of-res-size", type=int, nargs=2, default=(476, 854), help="Optical flow resolution size, (h, w). --NOTE-- change according to values in preprocess.yaml.")

I know this is a bit too trivial (laughed at myself when I realized this 😄 ) but could this be your problem?

willipwk commented 4 months ago

Hi, I also had a similar problem when training with custom video resolutions and the visualization looked off. Then I realized that I had to modify these lines in visualize_rainbow.py

parser.add_argument("--infer-res-size", type=int, nargs=2, default=(476, 854), help="Inference resolution size, (h, w). --NOTE-- change according to values in train.yaml.")

parser.add_argument("--of-res-size", type=int, nargs=2, default=(476, 854), help="Optical flow resolution size, (h, w). --NOTE-- change according to values in preprocess.yaml.")

I know this is a bit too trivial (laughed at myself when I realized this 😄 ) but could this be your problem?

Ohhhhhh I get it. Indeed after I specify these two arguments in my case, the output looks very nice. Thanks for both of your comments @AtaAtasoy, @tnarek .

tnarek commented 3 months ago

thanks @AtaAtasoy for your input! I'm closing this issue now.