The memory reduction factor of the cost matrix is sum(#target objects) / max(#target objects).
That is achieved by no longer computing and storing matching costs between predictions and targets at different positions inside the batch. More exactly the original matrix of shape [batch_size * queries, sum(#target objects)] is shrinked to a tensor of shape [batch_size, queries, max(#target objects)].
Besides allowing much larger batch sizes, tested on the table structure recognition task using the Table Transformer (TATR) (125 queries, 7 classes) with Pubmed data, this change also results a) on CUDA at all batch sizes and on CPU with small batches in a small but meaningful speedup, b) on CPU with larger batch sizes in much higher speedups.
The processing time reduction computed as (1 - new_time / old_time) is shown below in various configurations:
The memory reduction factor of the cost matrix is sum(#target objects) / max(#target objects).
That is achieved by no longer computing and storing matching costs between predictions and targets at different positions inside the batch. More exactly the original matrix of shape [batch_size * queries, sum(#target objects)] is shrinked to a tensor of shape [batch_size, queries, max(#target objects)].
Besides allowing much larger batch sizes, tested on the table structure recognition task using the Table Transformer (TATR) (125 queries, 7 classes) with Pubmed data, this change also results a) on CUDA at all batch sizes and on CPU with small batches in a small but meaningful speedup, b) on CPU with larger batch sizes in much higher speedups.
The processing time reduction computed as (1 - new_time / old_time) is shown below in various configurations: