Open pchankh opened 4 years ago
Very cool that you are trying to get it to work on a TPU. I am curious to see how this will go.
About your error, it looks like you are trying to run the custom CUDA kernel on a TPU, which expectedly won't work. We added an implementation that doesn't require the custom CUDA kernel, and you need to switch to that; install the latest version of the code
pip install --upgrade git+https://github.com/allenai/longformer.git
then follow the updated example in the readme.
Thanks. New April 27th, 2020: A PyTorch implementation of the sliding window attention
We added a PyTorch implementation of the sliding window attention that doesn't require the custom CUDA kernel. It is limited in functionality but more convenient to use for finetuning on downstream tasks.
For the above, how do we choose the pytorch implementation? Do we still use config.attention_mode = 'sliding_chunks'.
Many thanks.
yes
The as_strided
trick is not supported in pytorch/xla, and it has to be replaced with torch.unfold
.
Pytroch/xla doesn't have a lowering for torch.unfold
yet. Relevant issue: https://github.com/pytorch/xla/issues/2239
In case you are still interested, we have a working version in this branch https://github.com/allenai/longformer/tree/trainer. We are going to clean it up and merge it into master soon, but it is usable as is.
We try running wrapped longformer model under colab TPU and got the following errors:
Tvm binary not found. Compiling ... Exception in device=TPU:0: cannot import name 'nvcc' Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/torch_xla/distributed/xla_multiprocessing.py", line 119, in _start_fn fn(gindex, args) File "", line 66, in _mp_fn
fitter.fit(train_loader, validation_loader)
File "", line 47, in fit
losses, final_scores = self.train_one_epoch(para_loader.per_device_loader(self.device))
File "", line 120, in train_one_epoch
outputs = self.model(inputs, attention_masks)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 558, in call
result = self.forward( input, *kwargs)
File "", line 26, in forward
seqx, = self.backbone(input_ids=input_ids, attention_mask=attention_masks)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 558, in call
result = self.forward( input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/transformers/modeling_bert.py", line 790, in forward
....
Anyway to work around this error will be appreciated. Thanks.