What is the meaning of --num_thread_reader=0 in MSR-VTT training configuration?

thinh276 commented 2 years ago

Can you explain the number of thread reader in the training configuration? I can adjust this value to decrease my training time? (Why --num_thread_reader=0 in MSR-VTT while --num_thread_reader=2 in other dataset.) Thank you so much!

ArrowLuo commented 2 years ago

Hi @thinh276, the --num_thread_reader is used in dataloders, which can speed up the data reading. --num_thread_reader=0 in MSR-VTT can be regarded as a typo and feel free to adjust its value.

thinh276 commented 2 years ago

Hi @thinh276, the --num_thread_reader is used in dataloders, which can speed up the data reading. --num_thread_reader=0 in MSR-VTT can be regarded as a typo and feel free to adjust its value.

Thank you so much! My workstation is runing in a slow speed now. I will test with some values of --num_thread_reader. Does this value affects the accuracy? I use your code (--num_thread_reader=0) and test with 2 computers the results have the a gap:

A 2 GPUs desktop, the accuracy for meanP is 43.4 (--nproc_per_node=2, batch size is 64 for 2 GPUs)
A 4 GPUs workstation, the accuracy for meanP is 42.9 (--nproc_per_node=4, batch size is 128 for 4 GPUs)

If --num_thread_reader=0 value affects the training time only. Are my training results normal? Thank you!

ArrowLuo commented 2 years ago

Hi @thinh276, interesting results but I do not think the --num_thread_reader=0 will affect the performance. The difference may be caused by the GPU number, or other factors (not sure), e.g., CUDA's nondeterministic behavior. Below links are for your information,

Thanks.

thinh276 commented 2 years ago

I tested --num_thread_reader=2 and the training time decrease from 33 hours to 10 hours. (Great!) Thank you for your links of information. I will read it. I would like to inform you and others my detail resutls:

4-GPU (A6000) workstation (--num_thread_reader=0, --nproc_per_node=4, batch size is 128): There is a gap with result on your paper but the trend among similarly calculation methods is same.
2-GPU (3090) desktop (--num_thread_reader=0, --nproc_per_node=2, batch size is 64): -meanP is higher with meanP but seqTransf can not reach 44.5 as paper

Batch size/No. of GPU | Method | R@1 | R@5 | R@10 | MdR | MnR -- | -- | -- | -- | -- | -- | -- 128/4 GPUs | -meanP | 42.9 | 70.7 | 80.0 | 2.0 | 17.0 128/4 GPUs | -seqLSTM | 42.2 | 69.7 | 80.1 | 2.0 | 17.2 128/4 GPUs | -seqTransf | 43.0 | 70.2 | 81.2 | 2.0 | 16.1 128/4 GPUs | -seqTransf | 40.8 | 72.0 | 81.8 | 2.0 | 14.4 Batch size/No. of GPU | Method | R@1 | R@5 | R@10 | MdR | MnR -- | -- | -- | -- | -- | -- | -- 64/2 GPUs | -meanP | 43.4 | 71.3 | 80.8 | 2.0 | 16.6 64/2 GPUs | -seqTransf | 43.5 | 72.5 | 80.5 | 2.0 | 14.8

ArrowLuo commented 2 years ago

Hi @thinh276， thank you for your kind sharing. I am not sure the gap is normal or not, and whether the situations mentioned by the above links cause such a problem. If you want to compare to the paper, a choice is to report both the results of the paper and yours for a fair comparison.

thinh276 commented 2 years ago

It's not for comparison. As I undertand, there are many random values so we can not reach exactly the same result. But we can see the difference among similarly calculation methods are clearly. It was success with me when I tried to reproduce experimences. Thank you for your code and for your kind reply!

deepalchemist commented 7 months ago

@thinh276 @ArrowLuo Hello, I train CLIP4Clip model with --sim_header seq_Transf. But it seems that --num_thread_reader=8 results in worse accuracy than --num_thread_reader=0. Do you know why?

ArrowLuo / CLIP4Clip

What is the meaning of --num_thread_reader=0 in MSR-VTT training configuration? #51