facebookresearch / FBTT-Embedding

This is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed this library can reduce the total model size by up to 100x in Facebook’s open sourced DLRM model while achieving same model quality. Our implementation is faster than the state-of-the-art implementations. Existing the state-of-the-art library also decompresses the whole embedding tables on the fly therefore they do not provide memory reduction during runtime of the training. Our library decompresses only the requested rows therefore can provide 10,000 times memory footprint reduction per embedding table. The library also includes a software cache to store a portion of the entries in the table in decompressed format for faster lookup and process.
MIT License
192 stars 27 forks source link

When call cache_populate, CUDA Error: invalid device ordinal #20

Closed SeungsuBaek closed 2 years ago

SeungsuBaek commented 2 years ago

Code snippets

  File "/workspace/FBTT-Embedding/tt_embeddings_ops.py", line 792, in cache_populate
    self.cache_weight,
RuntimeError: CUDA error: invalid device ordinal

Environment

RTX 3090
CUDA 11.3
pytorch v1.9.0

CUDA error occurred when cache_populate was called. It has also been reported in previous issue. (#17 #16 )

In my case, there was no error in random dataset, but an error occurred when using terabyte dataset. Maybe the cause of the error was due to the CUDA version.

https://github.com/NVIDIA/cub/pull/259

I installed CUDA 11.4 and solve the problem.