This is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed this library can reduce the total model size by up to 100x in Facebook’s open sourced DLRM model while achieving same model quality. Our implementation is faster than the state-of-the-art implementations. Existing the state-of-the-art library also decompresses the whole embedding tables on the fly therefore they do not provide memory reduction during runtime of the training. Our library decompresses only the requested rows therefore can provide 10,000 times memory footprint reduction per embedding table. The library also includes a software cache to store a portion of the entries in the table in decompressed format for faster lookup and process.
MIT License
192
stars
27
forks
source link
cudaErrorIllegalAddress occurs when using TTEmbeddingBag and nn.EmbeddingBag at same time #16
As title says, a cudaErrorIllegalAddress occurs when using TTEmbeddingBag and nn.EmbeddingBag at same time.
I'v added self.cache_populate() to TableBatchedTTEmbeddingBag forward method just after self.update_cache(indices).
As title says, a cudaErrorIllegalAddress occurs when using TTEmbeddingBag and nn.EmbeddingBag at same time. I'v added
self.cache_populate()
toTableBatchedTTEmbeddingBag
forward method just afterself.update_cache(indices)
.self.cache_populate()
inTableBatchedTTEmbeddingBag
forwardnn.EmbeddingBag
cudaErrorIllegalAddress
cache_populate
after backward donenn.EmbeddingBag
RuntimeError: CUDA error: invalid device ordinal
self.cache_populate()
inTableBatchedTTEmbeddingBag
forwardnn.EmbeddingBag
cache_populate
after backward donenn.EmbeddingBag
self.cache_populate()
inTableBatchedTTEmbeddingBag
forwardnn.Embedding
cache_populate
after backward donenn.Embedding
Snippets
Env
I use docker image
pytorch/pytorch:1.9.0-cuda11.1-cudnn8-devel
.