Closed ricardorei closed 4 years ago
This seems to be floating point math issue. I get similar range of difference when trying on CPU, but on GPU it seems to be exactly the same till 10th digit. Some discussion on pytorch thread: https://github.com/pytorch/pytorch/issues/4914 (although that one has floating point issues on CUDA rather than CPU).
on CUDA:
tensor([-0.0130963037, 0.0021208122, 0.0833869055, 0.0168007165,
-0.0006483230], device='cuda:0')
tensor([-0.0130963037, 0.0021208122, 0.0833869055, 0.0168007165,
-0.0006483230], device='cuda:0')
tensor([-0.0130963037, 0.0021208122, 0.0833869055, 0.0168007165,
-0.0006483230], device='cuda:0')
on CPU
tensor([-0.0130964424, 0.0021210182, 0.0833871067, 0.0168008748,
-0.0006483837])
tensor([-0.0130964424, 0.0021210182, 0.0833871067, 0.0168008748,
-0.0006483837])
tensor([-0.0130964629, 0.0021210436, 0.0833871216, 0.0168008823,
-0.0006483909])
🐛 Bug
When using XLM-R the representations change depending on the batch size.
Code sample
Expected behavior
Additional context
If I decide to average pool overall embeddings or if I max pool these differences are even bigger.
Am I doing something wrong? Is this behaviour expected?