The .cpu() operation used in Hungarian matching, to bring tensor to cpu for linear-sum-assignment , takes a significant amount of time, as compared to the entire forward pass. Is there a specific method of it's usage, which (possibly) handles it's time consumption?
I am using Hungarian matching in one of my work, and using the .cpu() operation has significantly increased the training time.
The .cpu() operation used in Hungarian matching, to bring tensor to cpu for linear-sum-assignment , takes a significant amount of time, as compared to the entire forward pass. Is there a specific method of it's usage, which (possibly) handles it's time consumption? I am using Hungarian matching in one of my work, and using the .cpu() operation has significantly increased the training time.