facebookresearch / detr

End-to-End Object Detection with Transformers
Apache License 2.0
13.09k stars 2.37k forks source link

.cpu() operation as bottleneck in training #562

Open tadbeer opened 1 year ago

tadbeer commented 1 year ago

The .cpu() operation used in Hungarian matching, to bring tensor to cpu for linear-sum-assignment , takes a significant amount of time, as compared to the entire forward pass. Is there a specific method of it's usage, which (possibly) handles it's time consumption? I am using Hungarian matching in one of my work, and using the .cpu() operation has significantly increased the training time.