seg fault when running benchmark.py

Hi, I am using cuda 11 with quadro rtx 8000 gpu. I have changed the line 24 in setup.py to 75 75 from 70 70. But when I run the code, it gives me the following error

~/FBTT-Embedding$ python tt_embeddings_benchmark.py 
INFO:root:Creating TTEmbeddingBag tt_p_shapes: [200, 220, 250], tt_q_shapes: [4, 4, 4], tt_ranks: [1, 32, 32, 1], sparse: True, optimizer: sgd, learning_rate: 0.1, eps: 1e-10use_cache: True, cache_size: 0, hashtbl_size: 0
INFO:root:sparse: True, optimizer: sgd
INFO:root:p_shapes: [200, 220, 250], q_shapes: [4, 4, 4], ranks: [32, 32]
INFO:root:B: 512, E: 11000000, D: 64, nnz: 10240
Traceback (most recent call last):
  File "tt_embeddings_benchmark.py", line 216, in <module>
    main()
  File "/home/xiaoyunw/miniconda3/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/xiaoyunw/miniconda3/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/xiaoyunw/miniconda3/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/xiaoyunw/miniconda3/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "tt_embeddings_benchmark.py", line 187, in main
    lambda indices, offsets, _: tt_emb(indices, offsets).backward(grad_output),
  File "tt_embeddings_benchmark.py", line 100, in benchmark_requests
    f(indices, offsets, weights)
  File "tt_embeddings_benchmark.py", line 187, in <lambda>
    lambda indices, offsets, _: tt_emb(indices, offsets).backward(grad_output),
  File "/home/xiaoyunw/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/xiaoyunw/FBTT-Embedding/tt_embeddings_ops.py", line 822, in forward
    *(self.tt_cores),
  File "/home/xiaoyunw/FBTT-Embedding/tt_embeddings_ops.py", line 186, in forward
    list(ctx.tt_cores),
RuntimeError: CUDA error: too many resources requested for launch
Segmentation fault (core dumped)

facebookresearch / FBTT-Embedding

seg fault when running benchmark.py #9