16lemoing / dot

Dense Optical Tracking: Connecting the Dots
https://16lemoing.github.io/dot
MIT License
254 stars 16 forks source link

Can I add some datasets to train. #7

Closed rizentan closed 10 months ago

rizentan commented 10 months ago

Hi, Could I add some datasets for training, if so, what should I do. Thanks.

16lemoing commented 10 months ago

Hi @rizentan !

To train on custom data you should implement your own dataloader. You can take inspiration from movi_f_dataset.py. In particular, your dataset should output a dictionary with the following elements:

https://github.com/16lemoing/dot/blob/550e00336198dc143493e415d52720eb9a53ab55/dot/data/movi_f_dataset.py#L107-L114

If you want to add support to a public dataset that could be useful to other users I can help you preprocess it and add it to the repo.

rizentan commented 10 months ago

① If I use file "movi_f_dataset.py" to process custom datasets, do I still need to call it from file "preprocess.py" like file "movi_f_tf_dataset.py", or do I just need to execute file "movi_f_dataset.py" separately during preprocessing.

② Is it necessary to install torch3D?I encountered an out of memory error while executing file `"demo.py".

16lemoing commented 10 months ago

① If I use file "movi_f_dataset.py" to process custom datasets, do I still need to call it from file "preprocess.py" like file "movi_f_tf_dataset.py", or do I just need to execute file "movi_f_dataset.py" separately during preprocessing.

"preprocess.py" is used to convert data (from "movi_f_tf_dataset.py") in TF format to a new format which is easier to load with PyTorch (from "movi_f_dataset.py"). If you want to use your own data, the preprocessing steps are likely to be different. What is the current form of your data?

② Is it necessary to install torch3D?I encountered an out of memory error while executing file `"demo.py".

Installing torch3D is optional, but it may help for OOM errors. Did you try with the provided videos? What is the max memory of your GPU? It is likely that your input video is too long / the resolution it too high for your system. Can you try with a smaller video?

rizentan commented 10 months ago

① "preprocess.py" is used to convert data (from "movi_f_tf_dataset.py") in TF format to a new format which is easier to load with PyTorch (from "movi_f_dataset.py"). If you want to use your own data, the preprocessing steps are likely to be different. What is the current form of your data?

I have a video in MP4 format, but I don't know how to generate ".npy" files like the ones in "datasets\kubric\movi_f". Is this the result after processing with the "preprocess.py" file?

② Installing torch3D is optional, but it may help for OOM errors. Did you try with the provided videos? What is the max memory of your GPU? It is likely that your input video is too long / the resolution it too high for your system. Can you try with a smaller video?

My computer has 6GB of GPU memory, and I'm trying with the provided video. How can I determine if the installation is successful? when I execute "cd dot/utils/torch3d/&&Python setup.py build&&cd../../..", Now I have a "build" file under my torch3d file, but it does not take effect in the project. Renaming it to "install" also has no effect. How can I make torch3d take effect?

I apologize for causing you any inconvenience as a beginner in deep learning.

capture

16lemoing commented 10 months ago

I have a video in MP4 format, but I don't know how to generate ".npy" files like the ones in "datasets\kubric\movi_f". Is this the result after processing with the "preprocess.py" file?

For training, ideally, you also need ground truth tracks (which is what we store as ".npy" files). If you don't have ground truth but still want to train you can use CoTracker to have pseudo-labels for your data:

python preprocess.py --save_tracks --data_root datasets/YOUR_DATASET_NAME

Before that, make sure your videos are in datasets/YOUR_DATASET_NAME/videos in png format, following this naming convension:

0/  # first video
   000.png  # pad with as many zeros as needed
   001.png 
   ...
1/ # second video
   000.png
   001.png
   ...
2 /
...

Once preprocessing is done, for training you should use this modified command line

python train.py --data_root datasets/YOUR_DATASET_NAME --out_track_name cotracker

This will use pre-computed cotracker tracks as supervision instead of ground truth.

My computer has 6GB of GPU memory, and I'm trying with the provided video.

Installing Torch3D will not help you much in this case. The demo requires ~15GB (and also the preprocessing /training steps above, depending on the resolution of your videos).

rizentan commented 10 months ago

I have added two videos in "datasets/MY_ DATASET_NAME/video", and then in "preprocessoptions.py" the parameters "num_videos" was changed to "2", but the execution was not very smooth. Do I need to modify the parameters "download_path" and "data_dir"? I have sent you the detailed errors by email. Could you please help me analyze the reasons?

16lemoing commented 10 months ago

Hi! Please follow the steps I sent you by email. You can ignore "download_path" in your case since you use custom data, and you have to set "data_root" to the correct path (where your "video" folder is).

rizentan commented 10 months ago

I have successfully executed "preprocess.py", and generated ".npy" files in the "cotracker" folder. However, when I run "train.py", it prompts that it cannot find the ".npy" files in the "ground_truth" folder. What should I do in this situation?

16lemoing commented 10 months ago

Hi, as explained above you should specify that supervision should be done with CoTracker, since you do not have any ground truth for you data (I suppose).

Once preprocessing is done, for training you should use this modified command line

python train.py --data_root datasets/YOUR_DATASET_NAME --out_track_name cotracker

In your case, since both the input and supervision during training come from the same source, I recommend setting the number of tracks while executing "preprocess.py" to a larger number than default (2048), e.g., --num_tracks 8192. It will allow input/supervision to be different and may lead to better results. Another option is to use less tracks during training.

rizentan commented 10 months ago

Thank you very much for your help