Closed sanxchep closed 2 years ago
@sanxchep , currently we don't support native multi-GPUs support. It's the user's responsibility to manager multi-GPUs, split the model and run the sub graphs in pipeline. TensorRT is no special like other CUDA applications under this environment.
@sanxchep , currently we don't support native multi-GPUs support. It's the user's responsibility to manager multi-GPUs, split the model and run the sub graphs in pipeline. TensorRT is no special like other CUDA applications under this environment.
@ttyio If this is the case, can you direct me towards helpful documentation to do the same. I've noticed the trt docs aren't structured properly (an opinion, or maybe i dont have enough technical knowledge). So any links would be helpful.
Or else any other implementations under tensor RT that would have loaded large transformer model?
@sanxchep yes we are working on improving the documents ;-(
Currently we have a single GPU T5 demo in https://github.com/NVIDIA/TensorRT/tree/main/demo/HuggingFace/T5, but sorry there is no demo show how to pipeline/tensor parallel the model yet.
The native support for multi node multi GPU is in the plan, before it is supported, Maybe worth try https://github.com/NVIDIA/FasterTransformer, it supports multile GPU. Thanks
@ttyio I understand, Thanks for the help! let me see if i can churn up something!
I have a environment running python 3.8 in nvidia's official docker and have converted the initial macaw 11b model to onnx format. But when I try to load it and convert it to trt model (code below):
`t5_trt_encoder_engine = T5EncoderONNXFile( os.path.join(onnx_model_path, encoder_onnx_model_fpath), metadata ).as_trt_engine(os.path.join(tensorrt_model_path, encoder_onnx_model_fpath) + ".engine")
t5_trt_decoder_engine = T5DecoderONNXFile( os.path.join(onnx_model_path, decoder_onnx_model_fpath), metadata ).as_trt_engine(os.path.join(tensorrt_model_path, decoder_onnx_model_fpath) + ".engine")`
It shows the error -
[04/19/2022-08:26:38] [TRT] [W] TensorRT was linked against cuBLAS/cuBLASLt 11.6.5 but loaded cuBLAS/cuBLASLt 11.6.1 [04/19/2022-08:38:03] [TRT] [W] Skipping tactic 0 due to Myelin error: CUDA error 2 for 1468006400-byte allocation. [04/19/2022-08:38:03] [TRT] [E] 10: [optimizer.cpp::computeCosts::2011] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[(Unnamed Layer* 13) [Constant] + (Unnamed Layer* 14) [Shuffle]...Mul_1395]}.) [04/19/2022-08:38:03] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
We are running on a 4 GPU instance (K80s) with 64 gigs of total GPU memory. But when we checked the usage, only one GPU (id-0) had a memory usage of 87% and no usage or memory consumption on the rest. Is there a way to properly parallelize it into multiple gpus the way normal torch imports.