allenai / longformer

Longformer: The Long-Document Transformer
https://arxiv.org/abs/2004.05150
Apache License 2.0
2.02k stars 271 forks source link

Longformer encdec fails on TPU with "scalar type not supported" error #122

Open y-rokutan opened 3 years ago

y-rokutan commented 3 years ago

Hi,

I'm trying to train longformer encdec on Cloud TPUs with the following settings.

and script/summarization.py options:

  --model_path="./longformer-encdec-8192"
  --max_input_len=8192
  --grad_ckpt
  --batch_size=1
  --gpus=0

Setting tpu_cores=8 to pl.trainer loads the converted model on TPU with no error, but then fails after sanity check starts:

Namespace(adafactor=False, attention_dropout=0.1, attention_mode='sliding_chunks', attention_window=512, batch_size=1, debug=False, disable_checkpointing=False, epochs=5, fp32=False, gpus=0, grad_accum=1, grad_ckpt=True, label_smoothing=0.0, lr=3e-05, max_input_len=16384, max_output_len=256, model_path='./longformer-encdec-16384', no_progress_bar=False, num_workers=0, resume_ckpt=None, save_dir='summarization', save_prefix='test', seed=1234, test=False, tokenizer='facebook/bart-base', val_every=1.0, val_percent_check=1.0, warmup=1000)
GPU available: False, used: False
TPU available: True, using: 8 TPU cores
Using native 16bit precision.
training on 8 TPU cores
INIT TPU local core: 0, global rank: 0
INIT TPU local core: 1, global rank: 1
INIT TPU local core: 2, global rank: 2
INIT TPU local core: 3, global rank: 3
INIT TPU local core: 4, global rank: 4
INIT TPU local core: 5, global rank: 5
INIT TPU local core: 6, global rank: 6
INIT TPU local core: 7, global rank: 7

  | Name  | Type                                             | Params
---------------------------------------------------------------------------
0 | model | LongformerEncoderDecoderForConditionalGeneration | 459 M
/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/pytorch_lightning/utilities/distributed.py:25: UserWarning: Your val_dataloader has `shuffle=True`, it is best practice to turn this off for validation and test dataloaders.
  warnings.warn(*args, **kwargs)
/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/pytorch_lightning/utilities/distributed.py:25: UserWarning: The dataloader, val dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 32 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  warnings.warn(*args, **kwargs)
Validation sanity check: 0it [00:00, ?it/s]2020-10-05 02:34:02.456683: E     657 tensorflow/compiler/xla/xla_client/tf_logging.cc:11] Check failed: scalar_value.isIntegral()

Note: I will paste full stack-trace just after this post

I have no idea to debug this error, but it seems to come from XLA multiprocessing or using docker...?

Any comments and suggestions is appreciated.

y-rokutan commented 3 years ago

Full stack-trace (it's very long because of 8-process distribution.):


*** Begin stack trace ***
        tensorflow::CurrentStackTrace()
        torch_xla::XlaHelpers::ScalarValue(c10::Scalar, xla::PrimitiveType, xla::XlaBuilder*)

        torch_xla::ir::ops::InferOutputShape(absl::lts_2020_02_25::Span<xla::Shape const>, std::function<xla::XlaOp (absl::lts_2020_02_25::Span<xla::XlaOp const>)> const&)

        torch_xla::ir::Node::GetOpShape(std::function<xla::Shape ()> const&) const
        torch_xla::ir::Node::Node(torch_xla::ir::OpKind, absl::lts_2020_02_25::Span<torch_xla::ir::Value const>, std::function<xla::Shape ()> const&, unsigned long, absl::lts_2020_02_25::uint128)
        torch_xla::ir::ops::ConstantPadNd::ConstantPadNd(torch_xla::ir::Value const&, std::vector<long long, std::allocator<long long> >, c10::Scalar)
        void __gnu_cxx::new_allocator<torch_xla::ir::ops::ConstantPadNd>::construct<torch_xla::ir::ops::ConstantPadNd, torch_xla::ir::Value, std::vector<long long, std::allocator<long long> >&, c10::Scalar&>(torch_xla::ir::ops::ConstantPadNd*, torch_xla::ir::Value&&, std::vector<long long, std::allocator<long long> >&, c10::Scalar&)
        torch_xla::XLATensor::constant_pad_nd(torch_xla::XLATensor const&, absl::lts_2020_02_25::Span<long long const>, c10::Scalar)
        torch_xla::AtenXlaType::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<at::Tensor (*)(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<long>, c10::Scalar> >, at::Tensor (at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)>::call(c10::OperatorKernel*, at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        _PyCFunction_FastCallDict

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx
        PyEval_EvalCode

        PyRun_StringFlags
        PyRun_SimpleStringFlags
        Py_Main
        main
        __libc_start_main

*** End stack trace ***
Scalar type not supported
2020-10-05 02:34:02.456680: E     912 tensorflow/compiler/xla/xla_client/tf_logging.cc:11] Check failed: scalar_value.isIntegral()
*** Begin stack trace ***
        tensorflow::CurrentStackTrace()
        torch_xla::XlaHelpers::ScalarValue(c10::Scalar, xla::PrimitiveType, xla::XlaBuilder*)

        torch_xla::ir::ops::InferOutputShape(absl::lts_2020_02_25::Span<xla::Shape const>, std::function<xla::XlaOp (absl::lts_2020_02_25::Span<xla::XlaOp const>)> const&)

        torch_xla::ir::Node::GetOpShape(std::function<xla::Shape ()> const&) const
        torch_xla::ir::Node::Node(torch_xla::ir::OpKind, absl::lts_2020_02_25::Span<torch_xla::ir::Value const>, std::function<xla::Shape ()> const&, unsigned long, absl::lts_2020_02_25::uint128)
        torch_xla::ir::ops::ConstantPadNd::ConstantPadNd(torch_xla::ir::Value const&, std::vector<long long, std::allocator<long long> >, c10::Scalar)
        void __gnu_cxx::new_allocator<torch_xla::ir::ops::ConstantPadNd>::construct<torch_xla::ir::ops::ConstantPadNd, torch_xla::ir::Value, std::vector<long long, std::allocator<long long> >&, c10::Scalar&>(torch_xla::ir::ops::ConstantPadNd*, torch_xla::ir::Value&&, std::vector<long long, std::allocator<long long> >&, c10::Scalar&)
        torch_xla::XLATensor::constant_pad_nd(torch_xla::XLATensor const&, absl::lts_2020_02_25::Span<long long const>, c10::Scalar)
        torch_xla::AtenXlaType::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<at::Tensor (*)(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<long>, c10::Scalar> >, at::Tensor (at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)>::call(c10::OperatorKernel*, at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        _PyCFunction_FastCallDict

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx
        PyEval_EvalCode

        PyRun_StringFlags
        PyRun_SimpleStringFlags
        Py_Main
        main
        __libc_start_main

*** End stack trace ***
Scalar type not supported
2020-10-05 02:34:02.456676: E     827 tensorflow/compiler/xla/xla_client/tf_logging.cc:11] Check failed: scalar_value.isIntegral()
*** Begin stack trace ***
        tensorflow::CurrentStackTrace()
        torch_xla::XlaHelpers::ScalarValue(c10::Scalar, xla::PrimitiveType, xla::XlaBuilder*)

        torch_xla::ir::ops::InferOutputShape(absl::lts_2020_02_25::Span<xla::Shape const>, std::function<xla::XlaOp (absl::lts_2020_02_25::Span<xla::XlaOp const>)> const&)

        torch_xla::ir::Node::GetOpShape(std::function<xla::Shape ()> const&) const
        torch_xla::ir::Node::Node(torch_xla::ir::OpKind, absl::lts_2020_02_25::Span<torch_xla::ir::Value const>, std::function<xla::Shape ()> const&, unsigned long, absl::lts_2020_02_25::uint128)
        torch_xla::ir::ops::ConstantPadNd::ConstantPadNd(torch_xla::ir::Value const&, std::vector<long long, std::allocator<long long> >, c10::Scalar)
        void __gnu_cxx::new_allocator<torch_xla::ir::ops::ConstantPadNd>::construct<torch_xla::ir::ops::ConstantPadNd, torch_xla::ir::Value, std::vector<long long, std::allocator<long long> >&, c10::Scalar&>(torch_xla::ir::ops::ConstantPadNd*, torch_xla::ir::Value&&, std::vector<long long, std::allocator<long long> >&, c10::Scalar&)
        torch_xla::XLATensor::constant_pad_nd(torch_xla::XLATensor const&, absl::lts_2020_02_25::Span<long long const>, c10::Scalar)
        torch_xla::AtenXlaType::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<at::Tensor (*)(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<long>, c10::Scalar> >, at::Tensor (at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)>::call(c10::OperatorKernel*, at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        _PyCFunction_FastCallDict

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx
        PyEval_EvalCode

        PyRun_StringFlags
        PyRun_SimpleStringFlags
        Py_Main
        main
        __libc_start_main

*** End stack trace ***
Scalar type not supported
2020-10-05 02:34:02.456732: E     487 tensorflow/compiler/xla/xla_client/tf_logging.cc:11] Check failed: scalar_value.isIntegral()
*** Begin stack trace ***
        tensorflow::CurrentStackTrace()
        torch_xla::XlaHelpers::ScalarValue(c10::Scalar, xla::PrimitiveType, xla::XlaBuilder*)

        torch_xla::ir::ops::InferOutputShape(absl::lts_2020_02_25::Span<xla::Shape const>, std::function<xla::XlaOp (absl::lts_2020_02_25::Span<xla::XlaOp const>)> const&)

        torch_xla::ir::Node::GetOpShape(std::function<xla::Shape ()> const&) const
        torch_xla::ir::Node::Node(torch_xla::ir::OpKind, absl::lts_2020_02_25::Span<torch_xla::ir::Value const>, std::function<xla::Shape ()> const&, unsigned long, absl::lts_2020_02_25::uint128)
        torch_xla::ir::ops::ConstantPadNd::ConstantPadNd(torch_xla::ir::Value const&, std::vector<long long, std::allocator<long long> >, c10::Scalar)
        void __gnu_cxx::new_allocator<torch_xla::ir::ops::ConstantPadNd>::construct<torch_xla::ir::ops::ConstantPadNd, torch_xla::ir::Value, std::vector<long long, std::allocator<long long> >&, c10::Scalar&>(torch_xla::ir::ops::ConstantPadNd*, torch_xla::ir::Value&&, std::vector<long long, std::allocator<long long> >&, c10::Scalar&)
        torch_xla::XLATensor::constant_pad_nd(torch_xla::XLATensor const&, absl::lts_2020_02_25::Span<long long const>, c10::Scalar)
        torch_xla::AtenXlaType::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<at::Tensor (*)(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<long>, c10::Scalar> >, at::Tensor (at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)>::call(c10::OperatorKernel*, at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        _PyCFunction_FastCallDict

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx
        PyEval_EvalCode

        PyRun_StringFlags
        PyRun_SimpleStringFlags
        Py_Main
        main
        __libc_start_main

*** End stack trace ***
Scalar type not supported
2020-10-05 02:34:02.458095: E     997 tensorflow/compiler/xla/xla_client/tf_logging.cc:11] Check failed: scalar_value.isIntegral()
*** Begin stack trace ***
        tensorflow::CurrentStackTrace()
        torch_xla::XlaHelpers::ScalarValue(c10::Scalar, xla::PrimitiveType, xla::XlaBuilder*)

        torch_xla::ir::ops::InferOutputShape(absl::lts_2020_02_25::Span<xla::Shape const>, std::function<xla::XlaOp (absl::lts_2020_02_25::Span<xla::XlaOp const>)> const&)

        torch_xla::ir::Node::GetOpShape(std::function<xla::Shape ()> const&) const
        torch_xla::ir::Node::Node(torch_xla::ir::OpKind, absl::lts_2020_02_25::Span<torch_xla::ir::Value const>, std::function<xla::Shape ()> const&, unsigned long, absl::lts_2020_02_25::uint128)
        torch_xla::ir::ops::ConstantPadNd::ConstantPadNd(torch_xla::ir::Value const&, std::vector<long long, std::allocator<long long> >, c10::Scalar)
        void __gnu_cxx::new_allocator<torch_xla::ir::ops::ConstantPadNd>::construct<torch_xla::ir::ops::ConstantPadNd, torch_xla::ir::Value, std::vector<long long, std::allocator<long long> >&, c10::Scalar&>(torch_xla::ir::ops::ConstantPadNd*, torch_xla::ir::Value&&, std::vector<long long, std::allocator<long long> >&, c10::Scalar&)
        torch_xla::XLATensor::constant_pad_nd(torch_xla::XLATensor const&, absl::lts_2020_02_25::Span<long long const>, c10::Scalar)
        torch_xla::AtenXlaType::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<at::Tensor (*)(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<long>, c10::Scalar> >, at::Tensor (at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)>::call(c10::OperatorKernel*, at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        _PyCFunction_FastCallDict

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx
        PyEval_EvalCode

        PyRun_StringFlags
        PyRun_SimpleStringFlags
        Py_Main
        main
        __libc_start_main

*** End stack trace ***
Scalar type not supported
2020-10-05 02:34:02.458175: E     572 tensorflow/compiler/xla/xla_client/tf_logging.cc:11] Check failed: scalar_value.isIntegral()
*** Begin stack trace ***
        tensorflow::CurrentStackTrace()
        torch_xla::XlaHelpers::ScalarValue(c10::Scalar, xla::PrimitiveType, xla::XlaBuilder*)

        torch_xla::ir::ops::InferOutputShape(absl::lts_2020_02_25::Span<xla::Shape const>, std::function<xla::XlaOp (absl::lts_2020_02_25::Span<xla::XlaOp const>)> const&)

        torch_xla::ir::Node::GetOpShape(std::function<xla::Shape ()> const&) const
        torch_xla::ir::Node::Node(torch_xla::ir::OpKind, absl::lts_2020_02_25::Span<torch_xla::ir::Value const>, std::function<xla::Shape ()> const&, unsigned long, absl::lts_2020_02_25::uint128)
        torch_xla::ir::ops::ConstantPadNd::ConstantPadNd(torch_xla::ir::Value const&, std::vector<long long, std::allocator<long long> >, c10::Scalar)
        void __gnu_cxx::new_allocator<torch_xla::ir::ops::ConstantPadNd>::construct<torch_xla::ir::ops::ConstantPadNd, torch_xla::ir::Value, std::vector<long long, std::allocator<long long> >&, c10::Scalar&>(torch_xla::ir::ops::ConstantPadNd*, torch_xla::ir::Value&&, std::vector<long long, std::allocator<long long> >&, c10::Scalar&)
        torch_xla::XLATensor::constant_pad_nd(torch_xla::XLATensor const&, absl::lts_2020_02_25::Span<long long const>, c10::Scalar)
        torch_xla::AtenXlaType::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<at::Tensor (*)(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<long>, c10::Scalar> >, at::Tensor (at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)>::call(c10::OperatorKernel*, at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        _PyCFunction_FastCallDict

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx
        PyEval_EvalCode

        PyRun_StringFlags
        PyRun_SimpleStringFlags
        Py_Main
        main
        __libc_start_main

*** End stack trace ***
Scalar type not supported
2020-10-05 02:34:02.458221: E     742 tensorflow/compiler/xla/xla_client/tf_logging.cc:11] Check failed: scalar_value.isIntegral()
*** Begin stack trace ***
        tensorflow::CurrentStackTrace()
        torch_xla::XlaHelpers::ScalarValue(c10::Scalar, xla::PrimitiveType, xla::XlaBuilder*)

        torch_xla::ir::ops::InferOutputShape(absl::lts_2020_02_25::Span<xla::Shape const>, std::function<xla::XlaOp (absl::lts_2020_02_25::Span<xla::XlaOp const>)> const&)

        torch_xla::ir::Node::GetOpShape(std::function<xla::Shape ()> const&) const
        torch_xla::ir::Node::Node(torch_xla::ir::OpKind, absl::lts_2020_02_25::Span<torch_xla::ir::Value const>, std::function<xla::Shape ()> const&, unsigned long, absl::lts_2020_02_25::uint128)
        torch_xla::ir::ops::ConstantPadNd::ConstantPadNd(torch_xla::ir::Value const&, std::vector<long long, std::allocator<long long> >, c10::Scalar)
        void __gnu_cxx::new_allocator<torch_xla::ir::ops::ConstantPadNd>::construct<torch_xla::ir::ops::ConstantPadNd, torch_xla::ir::Value, std::vector<long long, std::allocator<long long> >&, c10::Scalar&>(torch_xla::ir::ops::ConstantPadNd*, torch_xla::ir::Value&&, std::vector<long long, std::allocator<long long> >&, c10::Scalar&)
        torch_xla::XLATensor::constant_pad_nd(torch_xla::XLATensor const&, absl::lts_2020_02_25::Span<long long const>, c10::Scalar)
        torch_xla::AtenXlaType::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<at::Tensor (*)(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<long>, c10::Scalar> >, at::Tensor (at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)>::call(c10::OperatorKernel*, at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        _PyCFunction_FastCallDict

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx
        PyEval_EvalCode

        PyRun_StringFlags
        PyRun_SimpleStringFlags
        Py_Main
        main
        __libc_start_main

*** End stack trace ***
Scalar type not supported
2020-10-05 02:34:02.458222: E     396 tensorflow/compiler/xla/xla_client/tf_logging.cc:11] Check failed: scalar_value.isIntegral()
*** Begin stack trace ***
        tensorflow::CurrentStackTrace()
        torch_xla::XlaHelpers::ScalarValue(c10::Scalar, xla::PrimitiveType, xla::XlaBuilder*)

        torch_xla::ir::ops::InferOutputShape(absl::lts_2020_02_25::Span<xla::Shape const>, std::function<xla::XlaOp (absl::lts_2020_02_25::Span<xla::XlaOp const>)> const&)

        torch_xla::ir::Node::GetOpShape(std::function<xla::Shape ()> const&) const
        torch_xla::ir::Node::Node(torch_xla::ir::OpKind, absl::lts_2020_02_25::Span<torch_xla::ir::Value const>, std::function<xla::Shape ()> const&, unsigned long, absl::lts_2020_02_25::uint128)
        torch_xla::ir::ops::ConstantPadNd::ConstantPadNd(torch_xla::ir::Value const&, std::vector<long long, std::allocator<long long> >, c10::Scalar)
        void __gnu_cxx::new_allocator<torch_xla::ir::ops::ConstantPadNd>::construct<torch_xla::ir::ops::ConstantPadNd, torch_xla::ir::Value, std::vector<long long, std::allocator<long long> >&, c10::Scalar&>(torch_xla::ir::ops::ConstantPadNd*, torch_xla::ir::Value&&, std::vector<long long, std::allocator<long long> >&, c10::Scalar&)
        torch_xla::XLATensor::constant_pad_nd(torch_xla::XLATensor const&, absl::lts_2020_02_25::Span<long long const>, c10::Scalar)
        torch_xla::AtenXlaType::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<at::Tensor (*)(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<long>, c10::Scalar> >, at::Tensor (at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)>::call(c10::OperatorKernel*, at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        _PyCFunction_FastCallDict

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx
        PyEval_EvalCode

        PyRun_StringFlags
        PyRun_SimpleStringFlags
        Py_Main
        main
        __libc_start_main

*** End stack trace ***
Scalar type not supported
Exception in device=TPU:3: /pytorch/xla/torch_xla/csrc/helpers.h:99 : Check failed: scalar_value.isIntegral()
*** Begin stack trace ***
        tensorflow::CurrentStackTrace()
        torch_xla::XlaHelpers::ScalarValue(c10::Scalar, xla::PrimitiveType, xla::XlaBuilder*)

        torch_xla::ir::ops::InferOutputShape(absl::lts_2020_02_25::Span<xla::Shape const>, std::function<xla::XlaOp (absl::lts_2020_02_25::Span<xla::XlaOp const>)> const&)

        torch_xla::ir::Node::GetOpShape(std::function<xla::Shape ()> const&) const
        torch_xla::ir::Node::Node(torch_xla::ir::OpKind, absl::lts_2020_02_25::Span<torch_xla::ir::Value const>, std::function<xla::Shape ()> const&, unsigned long, absl::lts_2020_02_25::uint128)
        torch_xla::ir::ops::ConstantPadNd::ConstantPadNd(torch_xla::ir::Value const&, std::vector<long long, std::allocator<long long> >, c10::Scalar)
        void __gnu_cxx::new_allocator<torch_xla::ir::ops::ConstantPadNd>::construct<torch_xla::ir::ops::ConstantPadNd, torch_xla::ir::Value, std::vector<long long, std::allocator<long long> >&, c10::Scalar&>(torch_xla::ir::ops::ConstantPadNd*, torch_xla::ir::Value&&, std::vector<long long, std::allocator<long long> >&, c10::Scalar&)
        torch_xla::XLATensor::constant_pad_nd(torch_xla::XLATensor const&, absl::lts_2020_02_25::Span<long long const>, c10::Scalar)
        torch_xla::AtenXlaType::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<at::Tensor (*)(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<long>, c10::Scalar> >, at::Tensor (at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)>::call(c10::OperatorKernel*, at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        _PyCFunction_FastCallDict

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx
        PyEval_EvalCode

        PyRun_StringFlags
        PyRun_SimpleStringFlags
        Py_Main
        main
        __libc_start_main

*** End stack trace ***
Scalar type not supportedException in device=TPU:5: /pytorch/xla/torch_xla/csrc/helpers.h:99 : Check failed: scalar_value.isIntegral()
*** Begin stack trace ***
        tensorflow::CurrentStackTrace()
        torch_xla::XlaHelpers::ScalarValue(c10::Scalar, xla::PrimitiveType, xla::XlaBuilder*)

        torch_xla::ir::ops::InferOutputShape(absl::lts_2020_02_25::Span<xla::Shape const>, std::function<xla::XlaOp (absl::lts_2020_02_25::Span<xla::XlaOp const>)> const&)

        torch_xla::ir::Node::GetOpShape(std::function<xla::Shape ()> const&) const
        torch_xla::ir::Node::Node(torch_xla::ir::OpKind, absl::lts_2020_02_25::Span<torch_xla::ir::Value const>, std::function<xla::Shape ()> const&, unsigned long, absl::lts_2020_02_25::uint128)
        torch_xla::ir::ops::ConstantPadNd::ConstantPadNd(torch_xla::ir::Value const&, std::vector<long long, std::allocator<long long> >, c10::Scalar)
        void __gnu_cxx::new_allocator<torch_xla::ir::ops::ConstantPadNd>::construct<torch_xla::ir::ops::ConstantPadNd, torch_xla::ir::Value, std::vector<long long, std::allocator<long long> >&, c10::Scalar&>(torch_xla::ir::ops::ConstantPadNd*, torch_xla::ir::Value&&, std::vector<long long, std::allocator<long long> >&, c10::Scalar&)
        torch_xla::XLATensor::constant_pad_nd(torch_xla::XLATensor const&, absl::lts_2020_02_25::Span<long long const>, c10::Scalar)
        torch_xla::AtenXlaType::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<at::Tensor (*)(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<long>, c10::Scalar> >, at::Tensor (at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)>::call(c10::OperatorKernel*, at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        _PyCFunction_FastCallDict

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx
        PyEval_EvalCode

        PyRun_StringFlags
        PyRun_SimpleStringFlags
        Py_Main
        main
        __libc_start_main

*** End stack trace ***
Scalar type not supportedException in device=TPU:6: /pytorch/xla/torch_xla/csrc/helpers.h:99 : Check failed: scalar_value.isIntegral()
*** Begin stack trace ***
        tensorflow::CurrentStackTrace()
        torch_xla::XlaHelpers::ScalarValue(c10::Scalar, xla::PrimitiveType, xla::XlaBuilder*)

        torch_xla::ir::ops::InferOutputShape(absl::lts_2020_02_25::Span<xla::Shape const>, std::function<xla::XlaOp (absl::lts_2020_02_25::Span<xla::XlaOp const>)> const&)

        torch_xla::ir::Node::GetOpShape(std::function<xla::Shape ()> const&) const
        torch_xla::ir::Node::Node(torch_xla::ir::OpKind, absl::lts_2020_02_25::Span<torch_xla::ir::Value const>, std::function<xla::Shape ()> const&, unsigned long, absl::lts_2020_02_25::uint128)
        torch_xla::ir::ops::ConstantPadNd::ConstantPadNd(torch_xla::ir::Value const&, std::vector<long long, std::allocator<long long> >, c10::Scalar)
        void __gnu_cxx::new_allocator<torch_xla::ir::ops::ConstantPadNd>::construct<torch_xla::ir::ops::ConstantPadNd, torch_xla::ir::Value, std::vector<long long, std::allocator<long long> >&, c10::Scalar&>(torch_xla::ir::ops::ConstantPadNd*, torch_xla::ir::Value&&, std::vector<long long, std::allocator<long long> >&, c10::Scalar&)
        torch_xla::XLATensor::constant_pad_nd(torch_xla::XLATensor const&, absl::lts_2020_02_25::Span<long long const>, c10::Scalar)
        torch_xla::AtenXlaType::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<at::Tensor (*)(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<long>, c10::Scalar> >, at::Tensor (at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)>::call(c10::OperatorKernel*, at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        _PyCFunction_FastCallDict

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx
        PyEval_EvalCode

        PyRun_StringFlags
        PyRun_SimpleStringFlags
        Py_Main
        main
        __libc_start_main

*** End stack trace ***
Scalar type not supportedException in device=TPU:1: /pytorch/xla/torch_xla/csrc/helpers.h:99 : Check failed: scalar_value.isIntegral()
*** Begin stack trace ***
        tensorflow::CurrentStackTrace()
        torch_xla::XlaHelpers::ScalarValue(c10::Scalar, xla::PrimitiveType, xla::XlaBuilder*)

        torch_xla::ir::ops::InferOutputShape(absl::lts_2020_02_25::Span<xla::Shape const>, std::function<xla::XlaOp (absl::lts_2020_02_25::Span<xla::XlaOp const>)> const&)

        torch_xla::ir::Node::GetOpShape(std::function<xla::Shape ()> const&) const
        torch_xla::ir::Node::Node(torch_xla::ir::OpKind, absl::lts_2020_02_25::Span<torch_xla::ir::Value const>, std::function<xla::Shape ()> const&, unsigned long, absl::lts_2020_02_25::uint128)
        torch_xla::ir::ops::ConstantPadNd::ConstantPadNd(torch_xla::ir::Value const&, std::vector<long long, std::allocator<long long> >, c10::Scalar)
        void __gnu_cxx::new_allocator<torch_xla::ir::ops::ConstantPadNd>::construct<torch_xla::ir::ops::ConstantPadNd, torch_xla::ir::Value, std::vector<long long, std::allocator<long long> >&, c10::Scalar&>(torch_xla::ir::ops::ConstantPadNd*, torch_xla::ir::Value&&, std::vector<long long, std::allocator<long long> >&, c10::Scalar&)
        torch_xla::XLATensor::constant_pad_nd(torch_xla::XLATensor const&, absl::lts_2020_02_25::Span<long long const>, c10::Scalar)
        torch_xla::AtenXlaType::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<at::Tensor (*)(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<long>, c10::Scalar> >, at::Tensor (at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)>::call(c10::OperatorKernel*, at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        _PyCFunction_FastCallDict

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx
        PyEval_EvalCode

        PyRun_StringFlags
        PyRun_SimpleStringFlags
        Py_Main
        main
        __libc_start_main

*** End stack trace ***
Scalar type not supported
Exception in device=TPU:7: /pytorch/xla/torch_xla/csrc/helpers.h:99 : Check failed: scalar_value.isIntegral()
*** Begin stack trace ***
        tensorflow::CurrentStackTrace()
        torch_xla::XlaHelpers::ScalarValue(c10::Scalar, xla::PrimitiveType, xla::XlaBuilder*)

        torch_xla::ir::ops::InferOutputShape(absl::lts_2020_02_25::Span<xla::Shape const>, std::function<xla::XlaOp (absl::lts_2020_02_25::Span<xla::XlaOp const>)> const&)

        torch_xla::ir::Node::GetOpShape(std::function<xla::Shape ()> const&) const
        torch_xla::ir::Node::Node(torch_xla::ir::OpKind, absl::lts_2020_02_25::Span<torch_xla::ir::Value const>, std::function<xla::Shape ()> const&, unsigned long, absl::lts_2020_02_25::uint128)
        torch_xla::ir::ops::ConstantPadNd::ConstantPadNd(torch_xla::ir::Value const&, std::vector<long long, std::allocator<long long> >, c10::Scalar)
        void __gnu_cxx::new_allocator<torch_xla::ir::ops::ConstantPadNd>::construct<torch_xla::ir::ops::ConstantPadNd, torch_xla::ir::Value, std::vector<long long, std::allocator<long long> >&, c10::Scalar&>(torch_xla::ir::ops::ConstantPadNd*, torch_xla::ir::Value&&, std::vector<long long, std::allocator<long long> >&, c10::Scalar&)
        torch_xla::XLATensor::constant_pad_nd(torch_xla::XLATensor const&, absl::lts_2020_02_25::Span<long long const>, c10::Scalar)
        torch_xla::AtenXlaType::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<at::Tensor (*)(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<long>, c10::Scalar> >, at::Tensor (at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)>::call(c10::OperatorKernel*, at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        _PyCFunction_FastCallDict

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx
        PyEval_EvalCode

        PyRun_StringFlags
        PyRun_SimpleStringFlags
        Py_Main
        main
        __libc_start_main

*** End stack trace ***
Scalar type not supportedException in device=TPU:2: /pytorch/xla/torch_xla/csrc/helpers.h:99 : Check failed: scalar_value.isIntegral()
*** Begin stack trace ***
        tensorflow::CurrentStackTrace()
        torch_xla::XlaHelpers::ScalarValue(c10::Scalar, xla::PrimitiveType, xla::XlaBuilder*)

        torch_xla::ir::ops::InferOutputShape(absl::lts_2020_02_25::Span<xla::Shape const>, std::function<xla::XlaOp (absl::lts_2020_02_25::Span<xla::XlaOp const>)> const&)

        torch_xla::ir::Node::GetOpShape(std::function<xla::Shape ()> const&) const
        torch_xla::ir::Node::Node(torch_xla::ir::OpKind, absl::lts_2020_02_25::Span<torch_xla::ir::Value const>, std::function<xla::Shape ()> const&, unsigned long, absl::lts_2020_02_25::uint128)
        torch_xla::ir::ops::ConstantPadNd::ConstantPadNd(torch_xla::ir::Value const&, std::vector<long long, std::allocator<long long> >, c10::Scalar)
        void __gnu_cxx::new_allocator<torch_xla::ir::ops::ConstantPadNd>::construct<torch_xla::ir::ops::ConstantPadNd, torch_xla::ir::Value, std::vector<long long, std::allocator<long long> >&, c10::Scalar&>(torch_xla::ir::ops::ConstantPadNd*, torch_xla::ir::Value&&, std::vector<long long, std::allocator<long long> >&, c10::Scalar&)
        torch_xla::XLATensor::constant_pad_nd(torch_xla::XLATensor const&, absl::lts_2020_02_25::Span<long long const>, c10::Scalar)
        torch_xla::AtenXlaType::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<at::Tensor (*)(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<long>, c10::Scalar> >, at::Tensor (at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)>::call(c10::OperatorKernel*, at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        _PyCFunction_FastCallDict

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx
        PyEval_EvalCode

        PyRun_StringFlags
        PyRun_SimpleStringFlags
        Py_Main
        main
        __libc_start_main

*** End stack trace ***
Scalar type not supported
Exception in device=TPU:4: /pytorch/xla/torch_xla/csrc/helpers.h:99 : Check failed: scalar_value.isIntegral()
*** Begin stack trace ***
        tensorflow::CurrentStackTrace()
        torch_xla::XlaHelpers::ScalarValue(c10::Scalar, xla::PrimitiveType, xla::XlaBuilder*)

        torch_xla::ir::ops::InferOutputShape(absl::lts_2020_02_25::Span<xla::Shape const>, std::function<xla::XlaOp (absl::lts_2020_02_25::Span<xla::XlaOp const>)> const&)

        torch_xla::ir::Node::GetOpShape(std::function<xla::Shape ()> const&) const
        torch_xla::ir::Node::Node(torch_xla::ir::OpKind, absl::lts_2020_02_25::Span<torch_xla::ir::Value const>, std::function<xla::Shape ()> const&, unsigned long, absl::lts_2020_02_25::uint128)
        torch_xla::ir::ops::ConstantPadNd::ConstantPadNd(torch_xla::ir::Value const&, std::vector<long long, std::allocator<long long> >, c10::Scalar)
        void __gnu_cxx::new_allocator<torch_xla::ir::ops::ConstantPadNd>::construct<torch_xla::ir::ops::ConstantPadNd, torch_xla::ir::Value, std::vector<long long, std::allocator<long long> >&, c10::Scalar&>(torch_xla::ir::ops::ConstantPadNd*, torch_xla::ir::Value&&, std::vector<long long, std::allocator<long long> >&, c10::Scalar&)
        torch_xla::XLATensor::constant_pad_nd(torch_xla::XLATensor const&, absl::lts_2020_02_25::Span<long long const>, c10::Scalar)
        torch_xla::AtenXlaType::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<at::Tensor (*)(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<long>, c10::Scalar> >, at::Tensor (at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)>::call(c10::OperatorKernel*, at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        _PyCFunction_FastCallDict

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx
        PyEval_EvalCode

        PyRun_StringFlags
        PyRun_SimpleStringFlags
        Py_Main
        main
        __libc_start_main

*** End stack trace ***
Scalar type not supportedException in device=TPU:0: /pytorch/xla/torch_xla/csrc/helpers.h:99 : Check failed: scalar_value.isIntegral()
*** Begin stack trace ***
        tensorflow::CurrentStackTrace()
        torch_xla::XlaHelpers::ScalarValue(c10::Scalar, xla::PrimitiveType, xla::XlaBuilder*)

        torch_xla::ir::ops::InferOutputShape(absl::lts_2020_02_25::Span<xla::Shape const>, std::function<xla::XlaOp (absl::lts_2020_02_25::Span<xla::XlaOp const>)> const&)

        torch_xla::ir::Node::GetOpShape(std::function<xla::Shape ()> const&) const
        torch_xla::ir::Node::Node(torch_xla::ir::OpKind, absl::lts_2020_02_25::Span<torch_xla::ir::Value const>, std::function<xla::Shape ()> const&, unsigned long, absl::lts_2020_02_25::uint128)
        torch_xla::ir::ops::ConstantPadNd::ConstantPadNd(torch_xla::ir::Value const&, std::vector<long long, std::allocator<long long> >, c10::Scalar)
        void __gnu_cxx::new_allocator<torch_xla::ir::ops::ConstantPadNd>::construct<torch_xla::ir::ops::ConstantPadNd, torch_xla::ir::Value, std::vector<long long, std::allocator<long long> >&, c10::Scalar&>(torch_xla::ir::ops::ConstantPadNd*, torch_xla::ir::Value&&, std::vector<long long, std::allocator<long long> >&, c10::Scalar&)
        torch_xla::XLATensor::constant_pad_nd(torch_xla::XLATensor const&, absl::lts_2020_02_25::Span<long long const>, c10::Scalar)
        torch_xla::AtenXlaType::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<at::Tensor (*)(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<long>, c10::Scalar> >, at::Tensor (at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)>::call(c10::OperatorKernel*, at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        _PyCFunction_FastCallDict

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx
        PyEval_EvalCode

        PyRun_StringFlags
        PyRun_SimpleStringFlags
        Py_Main
        main
        __libc_start_main

*** End stack trace ***
Scalar type not supported

Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 235, in _mp_start_fn
    _start_fn(index, pf_cfg, fn, args)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 235, in _mp_start_fn
    _start_fn(index, pf_cfg, fn, args)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 235, in _mp_start_fn
    _start_fn(index, pf_cfg, fn, args)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 235, in _mp_start_fn
    _start_fn(index, pf_cfg, fn, args)
Traceback (most recent call last):
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 235, in _mp_start_fn
    _start_fn(index, pf_cfg, fn, args)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 229, in _start_fn
    fn(gindex, *args)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 229, in _start_fn
    fn(gindex, *args)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 229, in _start_fn
    fn(gindex, *args)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 229, in _start_fn
    fn(gindex, *args)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 229, in _start_fn
    fn(gindex, *args)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 222, in tpu_train
    self.run_pretrain_routine(model)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 222, in tpu_train
    self.run_pretrain_routine(model)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 222, in tpu_train
    self.run_pretrain_routine(model)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 222, in tpu_train
    self.run_pretrain_routine(model)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 222, in tpu_train
    self.run_pretrain_routine(model)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 1196, in run_pretrain_routine
    False)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 1196, in run_pretrain_routine
    False)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 1196, in run_pretrain_routine
    False)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 1196, in run_pretrain_routine
    False)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 1196, in run_pretrain_routine
    False)
Traceback (most recent call last):
Traceback (most recent call last):
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 293, in _evaluate
    output = self.evaluation_forward(model, batch, batch_idx, dataloader_idx, test_mode)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 293, in _evaluate
    output = self.evaluation_forward(model, batch, batch_idx, dataloader_idx, test_mode)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 293, in _evaluate
    output = self.evaluation_forward(model, batch, batch_idx, dataloader_idx, test_mode)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 293, in _evaluate
    output = self.evaluation_forward(model, batch, batch_idx, dataloader_idx, test_mode)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 293, in _evaluate
    output = self.evaluation_forward(model, batch, batch_idx, dataloader_idx, test_mode)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 470, in evaluation_forward
    output = model.validation_step(*args)
  File "/root/longformer/scripts/summarization.py", line 156, in validation_step
    outputs = self.forward(*batch)
  File "/root/longformer/scripts/summarization.py", line 119, in forward
    input_ids, attention_mask = self._prepare_input(input_ids)
  File "/root/longformer/scripts/summarization.py", line 115, in _prepare_input
    input_ids, attention_mask, half_padding_mod, self.tokenizer.pad_token_id)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 235, in _mp_start_fn
    _start_fn(index, pf_cfg, fn, args)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 470, in evaluation_forward
    output = model.validation_step(*args)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 470, in evaluation_forward
    output = model.validation_step(*args)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 470, in evaluation_forward
    output = model.validation_step(*args)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 470, in evaluation_forward
    output = model.validation_step(*args)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/longformer/sliding_chunks.py", line 132, in pad_to_window_size
    attention_mask = F.pad(attention_mask, (0, padding_len), value=False)  # no attention on the padding tokens
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 229, in _start_fn
    fn(gindex, *args)
  File "/root/longformer/scripts/summarization.py", line 156, in validation_step
    outputs = self.forward(*batch)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 235, in _mp_start_fn
    _start_fn(index, pf_cfg, fn, args)
  File "/root/longformer/scripts/summarization.py", line 156, in validation_step
    outputs = self.forward(*batch)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 235, in _mp_start_fn
    _start_fn(index, pf_cfg, fn, args)
  File "/root/longformer/scripts/summarization.py", line 156, in validation_step
    outputs = self.forward(*batch)
  File "/root/longformer/scripts/summarization.py", line 156, in validation_step
    outputs = self.forward(*batch)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/functional.py", line 3552, in _pad
    return _VF.constant_pad_nd(input, pad, value)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 222, in tpu_train
    self.run_pretrain_routine(model)
  File "/root/longformer/scripts/summarization.py", line 119, in forward
    input_ids, attention_mask = self._prepare_input(input_ids)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 229, in _start_fn
    fn(gindex, *args)
  File "/root/longformer/scripts/summarization.py", line 119, in forward
    input_ids, attention_mask = self._prepare_input(input_ids)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 229, in _start_fn
    fn(gindex, *args)
  File "/root/longformer/scripts/summarization.py", line 119, in forward
    input_ids, attention_mask = self._prepare_input(input_ids)
  File "/root/longformer/scripts/summarization.py", line 119, in forward
    input_ids, attention_mask = self._prepare_input(input_ids)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 1196, in run_pretrain_routine
    False)
  File "/root/longformer/scripts/summarization.py", line 115, in _prepare_input
    input_ids, attention_mask, half_padding_mod, self.tokenizer.pad_token_id)
  File "/root/longformer/scripts/summarization.py", line 115, in _prepare_input
    input_ids, attention_mask, half_padding_mod, self.tokenizer.pad_token_id)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 222, in tpu_train
    self.run_pretrain_routine(model)
  File "/root/longformer/scripts/summarization.py", line 115, in _prepare_input
    input_ids, attention_mask, half_padding_mod, self.tokenizer.pad_token_id)
RuntimeError: /pytorch/xla/torch_xla/csrc/helpers.h:99 : Check failed: scalar_value.isIntegral()
*** Begin stack trace ***
        tensorflow::CurrentStackTrace()
        torch_xla::XlaHelpers::ScalarValue(c10::Scalar, xla::PrimitiveType, xla::XlaBuilder*)

        torch_xla::ir::ops::InferOutputShape(absl::lts_2020_02_25::Span<xla::Shape const>, std::function<xla::XlaOp (absl::lts_2020_02_25::Span<xla::XlaOp const>)> const&)

        torch_xla::ir::Node::GetOpShape(std::function<xla::Shape ()> const&) const
        torch_xla::ir::Node::Node(torch_xla::ir::OpKind, absl::lts_2020_02_25::Span<torch_xla::ir::Value const>, std::function<xla::Shape ()> const&, unsigned long, absl::lts_2020_02_25::uint128)
        torch_xla::ir::ops::ConstantPadNd::ConstantPadNd(torch_xla::ir::Value const&, std::vector<long long, std::allocator<long long> >, c10::Scalar)
        void __gnu_cxx::new_allocator<torch_xla::ir::ops::ConstantPadNd>::construct<torch_xla::ir::ops::ConstantPadNd, torch_xla::ir::Value, std::vector<long long, std::allocator<long long> >&, c10::Scalar&>(torch_xla::ir::ops::ConstantPadNd*, torch_xla::ir::Value&&, std::vector<long long, std::allocator<long long> >&, c10::Scalar&)
        torch_xla::XLATensor::constant_pad_nd(torch_xla::XLATensor const&, absl::lts_2020_02_25::Span<long long const>, c10::Scalar)
        torch_xla::AtenXlaType::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<at::Tensor (*)(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<long>, c10::Scalar> >, at::Tensor (at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)>::call(c10::OperatorKernel*, at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        _PyCFunction_FastCallDict

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx
        PyEval_EvalCode

        PyRun_StringFlags
        PyRun_SimpleStringFlags
        Py_Main
        main
        __libc_start_main

*** End stack trace ***
Scalar type not supported
  File "/root/longformer/scripts/summarization.py", line 115, in _prepare_input
    input_ids, attention_mask, half_padding_mod, self.tokenizer.pad_token_id)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 293, in _evaluate
    output = self.evaluation_forward(model, batch, batch_idx, dataloader_idx, test_mode)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/longformer/sliding_chunks.py", line 132, in pad_to_window_size
    attention_mask = F.pad(attention_mask, (0, padding_len), value=False)  # no attention on the padding tokens
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/longformer/sliding_chunks.py", line 132, in pad_to_window_size
    attention_mask = F.pad(attention_mask, (0, padding_len), value=False)  # no attention on the padding tokens
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 222, in tpu_train
    self.run_pretrain_routine(model)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 1196, in run_pretrain_routine
    False)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/longformer/sliding_chunks.py", line 132, in pad_to_window_size
    attention_mask = F.pad(attention_mask, (0, padding_len), value=False)  # no attention on the padding tokens
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/longformer/sliding_chunks.py", line 132, in pad_to_window_size
    attention_mask = F.pad(attention_mask, (0, padding_len), value=False)  # no attention on the padding tokens
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 470, in evaluation_forward
    output = model.validation_step(*args)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/functional.py", line 3552, in _pad
    return _VF.constant_pad_nd(input, pad, value)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/functional.py", line 3552, in _pad
    return _VF.constant_pad_nd(input, pad, value)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 1196, in run_pretrain_routine
    False)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/functional.py", line 3552, in _pad
    return _VF.constant_pad_nd(input, pad, value)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 293, in _evaluate
    output = self.evaluation_forward(model, batch, batch_idx, dataloader_idx, test_mode)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/functional.py", line 3552, in _pad
    return _VF.constant_pad_nd(input, pad, value)
  File "/root/longformer/scripts/summarization.py", line 156, in validation_step
    outputs = self.forward(*batch)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 293, in _evaluate
    output = self.evaluation_forward(model, batch, batch_idx, dataloader_idx, test_mode)
RuntimeError: /pytorch/xla/torch_xla/csrc/helpers.h:99 : Check failed: scalar_value.isIntegral()
*** Begin stack trace ***
        tensorflow::CurrentStackTrace()
        torch_xla::XlaHelpers::ScalarValue(c10::Scalar, xla::PrimitiveType, xla::XlaBuilder*)

        torch_xla::ir::ops::InferOutputShape(absl::lts_2020_02_25::Span<xla::Shape const>, std::function<xla::XlaOp (absl::lts_2020_02_25::Span<xla::XlaOp const>)> const&)

        torch_xla::ir::Node::GetOpShape(std::function<xla::Shape ()> const&) const
        torch_xla::ir::Node::Node(torch_xla::ir::OpKind, absl::lts_2020_02_25::Span<torch_xla::ir::Value const>, std::function<xla::Shape ()> const&, unsigned long, absl::lts_2020_02_25::uint128)
        torch_xla::ir::ops::ConstantPadNd::ConstantPadNd(torch_xla::ir::Value const&, std::vector<long long, std::allocator<long long> >, c10::Scalar)
        void __gnu_cxx::new_allocator<torch_xla::ir::ops::ConstantPadNd>::construct<torch_xla::ir::ops::ConstantPadNd, torch_xla::ir::Value, std::vector<long long, std::allocator<long long> >&, c10::Scalar&>(torch_xla::ir::ops::ConstantPadNd*, torch_xla::ir::Value&&, std::vector<long long, std::allocator<long long> >&, c10::Scalar&)
        torch_xla::XLATensor::constant_pad_nd(torch_xla::XLATensor const&, absl::lts_2020_02_25::Span<long long const>, c10::Scalar)
        torch_xla::AtenXlaType::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<at::Tensor (*)(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<long>, c10::Scalar> >, at::Tensor (at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)>::call(c10::OperatorKernel*, at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        _PyCFunction_FastCallDict

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx
        PyEval_EvalCode

        PyRun_StringFlags
        PyRun_SimpleStringFlags
        Py_Main
        main
        __libc_start_main

*** End stack trace ***
Scalar type not supported
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 470, in evaluation_forward
    output = model.validation_step(*args)
RuntimeError: /pytorch/xla/torch_xla/csrc/helpers.h:99 : Check failed: scalar_value.isIntegral()
*** Begin stack trace ***
        tensorflow::CurrentStackTrace()
        torch_xla::XlaHelpers::ScalarValue(c10::Scalar, xla::PrimitiveType, xla::XlaBuilder*)

        torch_xla::ir::ops::InferOutputShape(absl::lts_2020_02_25::Span<xla::Shape const>, std::function<xla::XlaOp (absl::lts_2020_02_25::Span<xla::XlaOp const>)> const&)

        torch_xla::ir::Node::GetOpShape(std::function<xla::Shape ()> const&) const
        torch_xla::ir::Node::Node(torch_xla::ir::OpKind, absl::lts_2020_02_25::Span<torch_xla::ir::Value const>, std::function<xla::Shape ()> const&, unsigned long, absl::lts_2020_02_25::uint128)
        torch_xla::ir::ops::ConstantPadNd::ConstantPadNd(torch_xla::ir::Value const&, std::vector<long long, std::allocator<long long> >, c10::Scalar)
        void __gnu_cxx::new_allocator<torch_xla::ir::ops::ConstantPadNd>::construct<torch_xla::ir::ops::ConstantPadNd, torch_xla::ir::Value, std::vector<long long, std::allocator<long long> >&, c10::Scalar&>(torch_xla::ir::ops::ConstantPadNd*, torch_xla::ir::Value&&, std::vector<long long, std::allocator<long long> >&, c10::Scalar&)
        torch_xla::XLATensor::constant_pad_nd(torch_xla::XLATensor const&, absl::lts_2020_02_25::Span<long long const>, c10::Scalar)
        torch_xla::AtenXlaType::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<at::Tensor (*)(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<long>, c10::Scalar> >, at::Tensor (at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)>::call(c10::OperatorKernel*, at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        _PyCFunction_FastCallDict

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx
        PyEval_EvalCode

        PyRun_StringFlags
        PyRun_SimpleStringFlags
        Py_Main
        main
        __libc_start_main

*** End stack trace ***
Scalar type not supported
  File "/root/longformer/scripts/summarization.py", line 119, in forward
    input_ids, attention_mask = self._prepare_input(input_ids)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 470, in evaluation_forward
    output = model.validation_step(*args)
RuntimeError: /pytorch/xla/torch_xla/csrc/helpers.h:99 : Check failed: scalar_value.isIntegral()
*** Begin stack trace ***
        tensorflow::CurrentStackTrace()
        torch_xla::XlaHelpers::ScalarValue(c10::Scalar, xla::PrimitiveType, xla::XlaBuilder*)

        torch_xla::ir::ops::InferOutputShape(absl::lts_2020_02_25::Span<xla::Shape const>, std::function<xla::XlaOp (absl::lts_2020_02_25::Span<xla::XlaOp const>)> const&)

        torch_xla::ir::Node::GetOpShape(std::function<xla::Shape ()> const&) const
        torch_xla::ir::Node::Node(torch_xla::ir::OpKind, absl::lts_2020_02_25::Span<torch_xla::ir::Value const>, std::function<xla::Shape ()> const&, unsigned long, absl::lts_2020_02_25::uint128)
        torch_xla::ir::ops::ConstantPadNd::ConstantPadNd(torch_xla::ir::Value const&, std::vector<long long, std::allocator<long long> >, c10::Scalar)
        void __gnu_cxx::new_allocator<torch_xla::ir::ops::ConstantPadNd>::construct<torch_xla::ir::ops::ConstantPadNd, torch_xla::ir::Value, std::vector<long long, std::allocator<long long> >&, c10::Scalar&>(torch_xla::ir::ops::ConstantPadNd*, torch_xla::ir::Value&&, std::vector<long long, std::allocator<long long> >&, c10::Scalar&)
        torch_xla::XLATensor::constant_pad_nd(torch_xla::XLATensor const&, absl::lts_2020_02_25::Span<long long const>, c10::Scalar)
        torch_xla::AtenXlaType::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<at::Tensor (*)(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<long>, c10::Scalar> >, at::Tensor (at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)>::call(c10::OperatorKernel*, at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        _PyCFunction_FastCallDict

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx
        PyEval_EvalCode

        PyRun_StringFlags
        PyRun_SimpleStringFlags
        Py_Main
        main
        __libc_start_main

*** End stack trace ***
Scalar type not supported
RuntimeError: /pytorch/xla/torch_xla/csrc/helpers.h:99 : Check failed: scalar_value.isIntegral()
*** Begin stack trace ***
        tensorflow::CurrentStackTrace()
        torch_xla::XlaHelpers::ScalarValue(c10::Scalar, xla::PrimitiveType, xla::XlaBuilder*)

        torch_xla::ir::ops::InferOutputShape(absl::lts_2020_02_25::Span<xla::Shape const>, std::function<xla::XlaOp (absl::lts_2020_02_25::Span<xla::XlaOp const>)> const&)

        torch_xla::ir::Node::GetOpShape(std::function<xla::Shape ()> const&) const
        torch_xla::ir::Node::Node(torch_xla::ir::OpKind, absl::lts_2020_02_25::Span<torch_xla::ir::Value const>, std::function<xla::Shape ()> const&, unsigned long, absl::lts_2020_02_25::uint128)
        torch_xla::ir::ops::ConstantPadNd::ConstantPadNd(torch_xla::ir::Value const&, std::vector<long long, std::allocator<long long> >, c10::Scalar)
        void __gnu_cxx::new_allocator<torch_xla::ir::ops::ConstantPadNd>::construct<torch_xla::ir::ops::ConstantPadNd, torch_xla::ir::Value, std::vector<long long, std::allocator<long long> >&, c10::Scalar&>(torch_xla::ir::ops::ConstantPadNd*, torch_xla::ir::Value&&, std::vector<long long, std::allocator<long long> >&, c10::Scalar&)
        torch_xla::XLATensor::constant_pad_nd(torch_xla::XLATensor const&, absl::lts_2020_02_25::Span<long long const>, c10::Scalar)
        torch_xla::AtenXlaType::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<at::Tensor (*)(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<long>, c10::Scalar> >, at::Tensor (at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)>::call(c10::OperatorKernel*, at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        _PyCFunction_FastCallDict

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx
        PyEval_EvalCode

        PyRun_StringFlags
        PyRun_SimpleStringFlags
        Py_Main
        main
        __libc_start_main

*** End stack trace ***
Scalar type not supported
  File "/root/longformer/scripts/summarization.py", line 156, in validation_step
    outputs = self.forward(*batch)
  File "/root/longformer/scripts/summarization.py", line 156, in validation_step
    outputs = self.forward(*batch)
  File "/root/longformer/scripts/summarization.py", line 119, in forward
    input_ids, attention_mask = self._prepare_input(input_ids)
  File "/root/longformer/scripts/summarization.py", line 115, in _prepare_input
    input_ids, attention_mask, half_padding_mod, self.tokenizer.pad_token_id)
  File "/root/longformer/scripts/summarization.py", line 119, in forward
    input_ids, attention_mask = self._prepare_input(input_ids)
  File "/root/longformer/scripts/summarization.py", line 115, in _prepare_input
    input_ids, attention_mask, half_padding_mod, self.tokenizer.pad_token_id)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/longformer/sliding_chunks.py", line 132, in pad_to_window_size
    attention_mask = F.pad(attention_mask, (0, padding_len), value=False)  # no attention on the padding tokens
  File "/root/longformer/scripts/summarization.py", line 115, in _prepare_input
    input_ids, attention_mask, half_padding_mod, self.tokenizer.pad_token_id)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/longformer/sliding_chunks.py", line 132, in pad_to_window_size
    attention_mask = F.pad(attention_mask, (0, padding_len), value=False)  # no attention on the padding tokens
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/functional.py", line 3552, in _pad
    return _VF.constant_pad_nd(input, pad, value)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/longformer/sliding_chunks.py", line 132, in pad_to_window_size
    attention_mask = F.pad(attention_mask, (0, padding_len), value=False)  # no attention on the padding tokens
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/functional.py", line 3552, in _pad
    return _VF.constant_pad_nd(input, pad, value)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/functional.py", line 3552, in _pad
    return _VF.constant_pad_nd(input, pad, value)
RuntimeError: /pytorch/xla/torch_xla/csrc/helpers.h:99 : Check failed: scalar_value.isIntegral()
*** Begin stack trace ***
        tensorflow::CurrentStackTrace()
        torch_xla::XlaHelpers::ScalarValue(c10::Scalar, xla::PrimitiveType, xla::XlaBuilder*)

        torch_xla::ir::ops::InferOutputShape(absl::lts_2020_02_25::Span<xla::Shape const>, std::function<xla::XlaOp (absl::lts_2020_02_25::Span<xla::XlaOp const>)> const&)

        torch_xla::ir::Node::GetOpShape(std::function<xla::Shape ()> const&) const
        torch_xla::ir::Node::Node(torch_xla::ir::OpKind, absl::lts_2020_02_25::Span<torch_xla::ir::Value const>, std::function<xla::Shape ()> const&, unsigned long, absl::lts_2020_02_25::uint128)
        torch_xla::ir::ops::ConstantPadNd::ConstantPadNd(torch_xla::ir::Value const&, std::vector<long long, std::allocator<long long> >, c10::Scalar)
        void __gnu_cxx::new_allocator<torch_xla::ir::ops::ConstantPadNd>::construct<torch_xla::ir::ops::ConstantPadNd, torch_xla::ir::Value, std::vector<long long, std::allocator<long long> >&, c10::Scalar&>(torch_xla::ir::ops::ConstantPadNd*, torch_xla::ir::Value&&, std::vector<long long, std::allocator<long long> >&, c10::Scalar&)
        torch_xla::XLATensor::constant_pad_nd(torch_xla::XLATensor const&, absl::lts_2020_02_25::Span<long long const>, c10::Scalar)
        torch_xla::AtenXlaType::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<at::Tensor (*)(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<long>, c10::Scalar> >, at::Tensor (at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)>::call(c10::OperatorKernel*, at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        _PyCFunction_FastCallDict

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx
        PyEval_EvalCode

        PyRun_StringFlags
        PyRun_SimpleStringFlags
        Py_Main
        main
        __libc_start_main

*** End stack trace ***
Scalar type not supported
RuntimeError: /pytorch/xla/torch_xla/csrc/helpers.h:99 : Check failed: scalar_value.isIntegral()
*** Begin stack trace ***
        tensorflow::CurrentStackTrace()
        torch_xla::XlaHelpers::ScalarValue(c10::Scalar, xla::PrimitiveType, xla::XlaBuilder*)

        torch_xla::ir::ops::InferOutputShape(absl::lts_2020_02_25::Span<xla::Shape const>, std::function<xla::XlaOp (absl::lts_2020_02_25::Span<xla::XlaOp const>)> const&)

        torch_xla::ir::Node::GetOpShape(std::function<xla::Shape ()> const&) const
        torch_xla::ir::Node::Node(torch_xla::ir::OpKind, absl::lts_2020_02_25::Span<torch_xla::ir::Value const>, std::function<xla::Shape ()> const&, unsigned long, absl::lts_2020_02_25::uint128)
        torch_xla::ir::ops::ConstantPadNd::ConstantPadNd(torch_xla::ir::Value const&, std::vector<long long, std::allocator<long long> >, c10::Scalar)
        void __gnu_cxx::new_allocator<torch_xla::ir::ops::ConstantPadNd>::construct<torch_xla::ir::ops::ConstantPadNd, torch_xla::ir::Value, std::vector<long long, std::allocator<long long> >&, c10::Scalar&>(torch_xla::ir::ops::ConstantPadNd*, torch_xla::ir::Value&&, std::vector<long long, std::allocator<long long> >&, c10::Scalar&)
        torch_xla::XLATensor::constant_pad_nd(torch_xla::XLATensor const&, absl::lts_2020_02_25::Span<long long const>, c10::Scalar)
        torch_xla::AtenXlaType::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<at::Tensor (*)(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<long>, c10::Scalar> >, at::Tensor (at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)>::call(c10::OperatorKernel*, at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        _PyCFunction_FastCallDict

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx
        PyEval_EvalCode

        PyRun_StringFlags
        PyRun_SimpleStringFlags
        Py_Main
        main
        __libc_start_main

*** End stack trace ***
Scalar type not supported
RuntimeError: /pytorch/xla/torch_xla/csrc/helpers.h:99 : Check failed: scalar_value.isIntegral()
*** Begin stack trace ***
        tensorflow::CurrentStackTrace()
        torch_xla::XlaHelpers::ScalarValue(c10::Scalar, xla::PrimitiveType, xla::XlaBuilder*)

        torch_xla::ir::ops::InferOutputShape(absl::lts_2020_02_25::Span<xla::Shape const>, std::function<xla::XlaOp (absl::lts_2020_02_25::Span<xla::XlaOp const>)> const&)

        torch_xla::ir::Node::GetOpShape(std::function<xla::Shape ()> const&) const
        torch_xla::ir::Node::Node(torch_xla::ir::OpKind, absl::lts_2020_02_25::Span<torch_xla::ir::Value const>, std::function<xla::Shape ()> const&, unsigned long, absl::lts_2020_02_25::uint128)
        torch_xla::ir::ops::ConstantPadNd::ConstantPadNd(torch_xla::ir::Value const&, std::vector<long long, std::allocator<long long> >, c10::Scalar)
        void __gnu_cxx::new_allocator<torch_xla::ir::ops::ConstantPadNd>::construct<torch_xla::ir::ops::ConstantPadNd, torch_xla::ir::Value, std::vector<long long, std::allocator<long long> >&, c10::Scalar&>(torch_xla::ir::ops::ConstantPadNd*, torch_xla::ir::Value&&, std::vector<long long, std::allocator<long long> >&, c10::Scalar&)
        torch_xla::XLATensor::constant_pad_nd(torch_xla::XLATensor const&, absl::lts_2020_02_25::Span<long long const>, c10::Scalar)
        torch_xla::AtenXlaType::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<at::Tensor (*)(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<long>, c10::Scalar> >, at::Tensor (at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)>::call(c10::OperatorKernel*, at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

        _PyCFunction_FastCallDict

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        _PyFunction_FastCallDict
        _PyObject_FastCallDict
        _PyObject_Call_Prepend
        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx

        PyObject_Call
        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault

        _PyEval_EvalFrameDefault
        PyEval_EvalCodeEx
        PyEval_EvalCode

        PyRun_StringFlags
        PyRun_SimpleStringFlags
        Py_Main
        main
        __libc_start_main

*** End stack trace ***
Scalar type not supported
Traceback (most recent call last):
  File "scripts/summarization.py", line 348, in <module>
    main(args)
  File "scripts/summarization.py", line 340, in main
    trainer.fit(model)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 1021, in fit
    xmp.spawn(self.tpu_train, args=(model,), nprocs=self.tpu_cores, start_method=start_method)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 300, in spawn
    start_method=start_method)
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
    while not context.join():
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 113, in join
    (error_index, exitcode)
Exception: process 5 terminated with exit code 17```
ibeltagy commented 3 years ago

We haven't tried to run longformer-encoder-decoder on pytorch-xla, but here are a few suggestions:

aditya-malte commented 3 years ago

@ibeltagy Strangely, I’m also facing the same issue on longformer-4096 (not encdec). Has there been any major change? (P.S. I’m using Longformer through Huggingface Transformer)

aditya-malte commented 3 years ago

I’m doing finetuning as a sequence classification(regression but transformers uses the same class name) task. Is that unsupported on TPUs? Asking because I’m able to train DistilRoberta