Closed ibeltagy closed 4 years ago
Thanks @okpatil4u for offering to help. Getting it to work on CPU is pretty straightforward, but making it fast is more involved. Here are the steps to get the CPU code to work:
'cpu'
longformer/lib/
directory to confirm the binaries were generatedNow the more involved part; parallelizing the computation and making it fast. TVM has this nice tutorial that explains the TVM syntax for splitting a CPU computation into multiple smaller parallel jobs. I think the schedule that TVM implemented for batched_matmul
here might work well for our kernel, but it will require a few modifications to work (need to support a different input format). So between the tutorial and the batched_matmul
schedule, you can write something that is fast enough.
As I said, the second part is more involved, so let's start with the first part first and leave speeding it up to another PR.
Thank you. I will give it a try.
I'll be rooting for you @okpatil4u 🙏
@okpatil4u @bratao, we just added a PyToch implementation of the sliding window attention that doesn't need the custom CUDA kernel (https://github.com/allenai/longformer/pull/27). Please give it a try and let me know if we still need this.
@ibeltagy will these lines 1 2 3 from trviaqa script create issues while running on CPU?
Yes, it won't work, but it is just a config in the script. Please try
trainer = pl.Trainer(gpus=None, distributed_backend=None,
I tried it already, but then I ran into below error
INFO:root:model and trainer restored from checkpoint: /content/longformer/triviaqa-longformer-large/checkpoints/_ckpt_epoch_4_v2.ckpt
Testing: 0% 0/1 [00:00<?, ?batch/s]THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=50 error=38 : no CUDA-capable device is detected
I am not sure why it is doing THCudaCheck
@Akshayextreme, I updated the script to use cpu, try the command line params: --gpus "" --fp32
. It seems to work fine and I didn't get the THCudaCheck
error you mentioned. Can you post the full error log?
Here is complete error log. I have used the updated scripts.
Query : python -m triviaqa --save_dir /content/longformer --train_dataset /content/longformer/try-test-wikipedia.json --dev_dataset /content/longformer/try-test-wikipedia.json --gpus "" --num_workers 4 --max_seq_len 4096 --doc_stride -1 --save_prefix triviaqa-longformer-large --model_path /content/longformer/longformer-large-4096 --test --fp32
Logs :
2020-04-30 05:55:33.836144: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 INFO:transformers.tokenization_utils:loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-base-vocab.json from cache at /root/.cache/torch/transformers/d0c5776499adc1ded22493fae699da0971c1ee4c2587111707a4d177d20257a2.ef00af9e673c7160b4d41cfda1f48c5f4cba57d5142754525572a846a1ab1b9b INFO:transformers.tokenization_utils:loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-base-merges.txt from cache at /root/.cache/torch/transformers/b35e7cd126cd4229a746b5d5c29a749e8e84438b14bcdb575950584fe33207e8.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda INFO:transformers.configuration_utils:loading configuration file /content/longformer/longformer-large-4096/config.json INFO:transformers.configuration_utils:Model config { "attention_dilation": [ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ], "attention_mode": "tvm", "attention_probs_dropout_prob": 0.1, "attention_window": [ 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256 ], "autoregressive": false, "finetuning_task": null, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 1024, "ignore_attention_mask": false, "initializer_range": 0.02, "intermediate_size": 4096, "layer_norm_eps": 1e-05, "max_position_embeddings": 4098, "num_attention_heads": 16, "num_hidden_layers": 24, "num_labels": 2, "output_attentions": false, "output_hidden_states": false, "pruned_heads": {}, "torchscript": false, "type_vocab_size": 1, "use_bfloat16": false, "vocab_size": 50265 }
INFO:transformers.modeling_utils:loading weights file /content/longformer/longformer-large-4096/pytorch_model.bin Loaded model with config: { "attention_dilation": [ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ], "attention_mode": "tvm", "attention_probs_dropout_prob": 0.1, "attention_window": [ 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256 ], "autoregressive": false, "finetuning_task": null, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 1024, "ignore_attention_mask": false, "initializer_range": 0.02, "intermediate_size": 4096, "layer_norm_eps": 1e-05, "max_position_embeddings": 4098, "num_attention_heads": 16, "num_hidden_layers": 24, "num_labels": 2, "output_attentions": false, "output_hidden_states": false, "pruned_heads": {}, "torchscript": false, "type_vocab_size": 1, "use_bfloat16": false, "vocab_size": 50265 }
/usr/local/lib/python3.6/dist-packages/pytorch_lightning/callbacks/pt_callbacks.py:224: UserWarning: Checkpoint directory /content/longformer/triviaqa-longformer-large/checkpoints exists and is not empty with save_top_k != 0.All files in this directory will be deleted when a checkpoint is saved! f"Checkpoint directory {filepath} exists and is not empty with save_top_k != 0." Namespace(attention_mode='sliding_chunks', attention_window=256, batch_size=8, dev_dataset='/content/longformer/try-test-wikipedia.json', disable_checkpointing=False, doc_stride=-1, epochs=30, fp32=True, gpus=None, ignore_seq_with_no_answers=False, lr=0.0001, max_answer_length=30, max_doc_len=4096, max_num_answers=64, max_question_len=55, max_seq_len=4096, model_path='/content/longformer/longformer-large-4096', n_best_size=20, no_progress_bar=False, num_workers=4, regular_softmax_loss=False, save_dir='/content/longformer', save_prefix='triviaqa-longformer-large', seed=1234, test=True, train_dataset='/content/longformer/try-test-wikipedia.json', val_every=0.2, val_percent_check=1.0, warmup=200)
steps: 414930.0, #epochs: 30, batch_size: 8 <<<<<<<
/usr/local/lib/python3.6/dist-packages/torch/optim/lr_scheduler.py:82: UserWarning: Detected call of
lr_scheduler.step()
beforeoptimizer.step()
. In PyTorch 1.1.0 and later, you should call them in the opposite order:optimizer.step()
beforelr_scheduler.step()
. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule.See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning) reading file: /content/longformer/try-test-wikipedia.json done reading file: /content/longformer/try-test-wikipedia.json reading file: /content/longformer/try-test-wikipedia.json done reading file: /content/longformer/try-test-wikipedia.json reading file: /content/longformer/try-test-wikipedia.json done reading file: /content/longformer/try-test-wikipedia.json INFO:root: Name ... Params 0 model ... 434 M 1 model.embeddings ... 55 M 2 model.embeddings.word_embeddings ... 51 M 3 model.embeddings.position_embeddings ... 4 M 4 model.embeddings.token_type_embeddings ... 1 K .. ... ... ... 464 model.encoder.layer.23.output.dropout ... 0
465 model.pooler ... 1 M 466 model.pooler.dense ... 1 M 467 model.pooler.activation ... 0
468 qa_outputs ... 2 K
[469 rows x 3 columns]
INFO:root:model and trainer restored from checkpoint: /content/longformer/triviaqa-longformer-large/checkpoints/_ckpt_epoch_4_v2.ckpt
Testing: 0% 0/1 [00:00<?, ?batch/s]THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=50 error=38 : no CUDA-capable device is detected
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/longformer/triviaqa.py", line 704, in
Fixed. Can you check again?
It worked! Thanks!
I suggest to update cheatsheet to run pretrained TriviaQA large model with cpu
Hi,
Regarding the https://github.com/allenai/longformer 3. Run the model
The example given. How might I run this small test case to run on CPU's ?
Many thanks in advance.
@Adrian-1234, maybe I am missing something but the example in the Readme already runs on CPU.
Hi,
I get no o/p from the test (apart from the warnings below):
$ python3 y.py (y.py is exactly as per the example given) /home/adrian/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint8 = np.dtype([("qint8", np.int8, 1)]) /home/adrian/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint8 = np.dtype([("quint8", np.uint8, 1)]) /home/adrian/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint16 = np.dtype([("qint16", np.int16, 1)]) /home/adrian/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint16 = np.dtype([("quint16", np.uint16, 1)]) /home/adrian/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint32 = np.dtype([("qint32", np.int32, 1)]) /home/adrian/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. np_resource = np.dtype([("resource", np.ubyte, 1)]) $
I was expecting it to print "Hello world!" once ? I have pip installed the requirements.
Printing output and attention_mask I get:
tensor([[[-0.0487, -0.0083, 0.0357, ..., -0.0348, -0.0800, -0.0212],
[-0.1541, 0.2812, 0.2079, ..., 0.3218, 0.0356, 0.0424],
[-0.0806, 0.0276, 0.1017, ..., -0.3952, -0.0781, 0.3135],
...,
[-0.0236, 0.0741, -0.0145, ..., -0.0990, -0.0409, -0.0745],
[-0.0236, 0.0741, -0.0145, ..., -0.0990, -0.0409, -0.0745],
[-0.0236, 0.0741, -0.0145, ..., -0.0990, -0.0409, -0.0745]]],
grad_fn=
Thanks.
I haven't seen the warning before but it looks like a known issue and it is discussed here.
The output
and attention_mask
are tensors of numbers as you get, not the string "Hello world!"
Ok, Thanks. So they code is working correctly on CPUs then.
Won't fix now that we have the sliding_chunks
implementation working on CPU.
How much difficult would it be to add support to CPU ? I can contribute if you provide guideline.