facebookresearch / FBTT-Embedding

This is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed this library can reduce the total model size by up to 100x in Facebook’s open sourced DLRM model while achieving same model quality. Our implementation is faster than the state-of-the-art implementations. Existing the state-of-the-art library also decompresses the whole embedding tables on the fly therefore they do not provide memory reduction during runtime of the training. Our library decompresses only the requested rows therefore can provide 10,000 times memory footprint reduction per embedding table. The library also includes a software cache to store a portion of the entries in the table in decompressed format for faster lookup and process.
MIT License
192 stars 27 forks source link

seg fault when running benchmark.py #9

Closed wangxiaoyunNV closed 3 years ago

wangxiaoyunNV commented 3 years ago

Hi, I am using cuda 11 with quadro rtx 8000 gpu. I have changed the line 24 in setup.py to 75 75 from 70 70. But when I run the code, it gives me the following error

~/FBTT-Embedding$ python tt_embeddings_benchmark.py 
INFO:root:Creating TTEmbeddingBag tt_p_shapes: [200, 220, 250], tt_q_shapes: [4, 4, 4], tt_ranks: [1, 32, 32, 1], sparse: True, optimizer: sgd, learning_rate: 0.1, eps: 1e-10use_cache: True, cache_size: 0, hashtbl_size: 0
INFO:root:sparse: True, optimizer: sgd
INFO:root:p_shapes: [200, 220, 250], q_shapes: [4, 4, 4], ranks: [32, 32]
INFO:root:B: 512, E: 11000000, D: 64, nnz: 10240
Traceback (most recent call last):
  File "tt_embeddings_benchmark.py", line 216, in <module>
    main()
  File "/home/xiaoyunw/miniconda3/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/xiaoyunw/miniconda3/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/xiaoyunw/miniconda3/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/xiaoyunw/miniconda3/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "tt_embeddings_benchmark.py", line 187, in main
    lambda indices, offsets, _: tt_emb(indices, offsets).backward(grad_output),
  File "tt_embeddings_benchmark.py", line 100, in benchmark_requests
    f(indices, offsets, weights)
  File "tt_embeddings_benchmark.py", line 187, in <lambda>
    lambda indices, offsets, _: tt_emb(indices, offsets).backward(grad_output),
  File "/home/xiaoyunw/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/xiaoyunw/FBTT-Embedding/tt_embeddings_ops.py", line 822, in forward
    *(self.tt_cores),
  File "/home/xiaoyunw/FBTT-Embedding/tt_embeddings_ops.py", line 186, in forward
    list(ctx.tt_cores),
RuntimeError: CUDA error: too many resources requested for launch
Segmentation fault (core dumped)
wangxiaoyunNV commented 3 years ago

solved, after changing a V100 machine, and use nivida pytorch docker