Closed subercui closed 4 years ago
Thanks @subercui, this error arises because the PyTorch Metric Learning library. I opened an issue on Apex here but no response :( maybe you can open an issue on PyTorch Metric Learning?
Thanks! I'll have a look
I found a manual solution that works. Install PyTorch Metric Learning from source and change:
torch.max(neg_pairs, dim=1, keepdim=True)[0])
to
torch.max(neg_pairs, dim=1, keepdim=True)[0].half())
in NTXentLoss
. Still, I think it makes sense to raise this issue on the PyTorch Metric Learning github.
I think this happens because I create infinity values using python's float('inf')
. I could have an optional half_precision
flag for all loss functions, and if it's True, then cast all numbers made with float()
to pytorch's half()
Ah, I think you are right. There's a discussion on this HF Transformers PR where they end up writing an assert
for a similar scenario:
masked_bias = self.masked_bias.to(w.dtype)
assert masked_bias.item() != -float("inf"), "Make sure `self.masked_bias` is not `-inf` in fp16 mode"
w = torch.where(mask, w, masked_bias)
What about replacing float('inf')
with a very large value instead (see here)? That way, amp can handle it automatically and there's no need for the user to specify half_precision
(update: upon closer inspection of that issue, I am not sure if this will actually work).
At least for NTXentLoss, setting it to a large negative value (instead of float('-inf')
) would be fine, because the purpose is to make particular entries 0 when passed to torch.exp. I'll have to check if it makes sense for the other places where I use float
Awesome, thanks for weighing in!
v0.9.90.dev0 supports half precision
pip install pytorch-metric-learning==0.9.90.dev0
@KevinMusgrave Awesome! Thanks a lot.
Running apex with
allennlp train configs/contrastive.jsonnet -s tmp --include-package t2t -o "{"trainer": {"opt_level": 'O1'}}"
returns exceptions as following: