InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
4.64k stars 427 forks source link

api_server 方式部署有概率卡住 #2691

Open LiYtao opened 2 weeks ago

LiYtao commented 2 weeks ago

Checklist

Describe the bug

使用api_server 方式部署internvl2-40B时有概率卡住,使用了4张A100,debug日志的最后一行是 [TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = __nv_bfloat16] start

Reproduction

CUDA_VISIBLE_DEVICES=0,3,4,7 NCCL_P2P_DIRECT_DISABLE=1 lmdeploy serve api_server /var/aigc/model/InternVL2-40B --tp 4 --server-port 23333 --log-level DEBUG

Environment

root@71dde6280c86:/# lmdeploy check_env
sys.platform: linux
Python: 3.10.12 (main, Sep 11 2024, 15:47:36) [GCC 11.4.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0,1,2,3,4,5,6,7: NVIDIA A100-PCIE-40GB
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.7, V11.7.99
GCC: x86_64-linux-gnu-gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
PyTorch: 2.4.0+cu118
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.4.2 (Git Hash 1137e04ec0b5251ca2b4400a4fd3c667ce843d67)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 11.8
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_90,code=sm_90
  - CuDNN 90.1
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=9.1.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.4.0, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,

TorchVision: 0.19.0+cu118
LMDeploy: 0.6.2+
transformers: 4.46.1
gradio: Not Found
fastapi: 0.115.4
pydantic: 2.9.2
triton: 3.0.0
NVIDIA Topology:
    GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    CPU Affinity    NUMA Affinity
GPU0     X  PIX PIX PIX SYS SYS SYS SYS 0-23,48-71  0
GPU1    PIX  X  PIX PIX SYS SYS SYS SYS 0-23,48-71  0
GPU2    PIX PIX  X  PIX SYS SYS SYS SYS 0-23,48-71  0
GPU3    PIX PIX PIX  X  SYS SYS SYS SYS 0-23,48-71  0
GPU4    SYS SYS SYS SYS  X  PIX PIX PIX 24-47,72-95 1
GPU5    SYS SYS SYS SYS PIX  X  PIX PIX 24-47,72-95 1
GPU6    SYS SYS SYS SYS PIX PIX  X  PIX 24-47,72-95 1
GPU7    SYS SYS SYS SYS PIX PIX PIX  X  24-47,72-95 1

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

Error traceback

[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x311fd20200 with size 8388608
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = curandStateXORWOW; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x3120520200 with size 12288
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x3120523200 with size 8388608
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = curandStateXORWOW; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x9abd20200 with size 8388608
[TM][DEBUG] malloc buffer 0x3120d23200 with size 12288
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = curandStateXORWOW; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x9ac520200 with size 12288
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x9ac523200 with size 8388608
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = curandStateXORWOW; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x44d9d20200 with size 8388608
[TM][DEBUG] malloc buffer 0x9acd23200 with size 12288
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = curandStateXORWOW; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x1d65d20200 with size 8388608
[TM][DEBUG] malloc buffer 0x44da520200 with size 12288
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = curandStateXORWOW; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x44da523200 with size 8388608
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = curandStateXORWOW; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x1d66520200 with size 12288
[TM][DEBUG] malloc buffer 0x44dad23200 with size 12288
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x1d66523200 with size 8388608
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = curandStateXORWOW; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x1d66d23200 with size 12288
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7ef650c00000 with size 8388608
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7ef651400000 with size 8388608
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7ef41ec00000 with size 8388608
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7ef41f400000 with size 8388608
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] malloc buffer 0x7f0a8d826400 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7f08b1c26400 with size 1024
[TM][DEBUG] malloc buffer 0x7f0a8d826800 with size 1056
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7f0a69026400 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = long unsigned int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] malloc buffer 0x7f0a75826400 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7f0a8d826e00 with size 262144
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7f0a75826800 with size 1056
[TM][DEBUG] malloc buffer 0x7f0a8d866e00 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = long unsigned int; size_t = long unsigned int]
[TM][DEBUG] malloc buffer 0x7f0a8d867200 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = bool; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] malloc buffer 0x7f0a8d867600 with size 512
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = float; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7f08b1c26800 with size 1056
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = long unsigned int; size_t = long unsigned int]
[TM][DEBUG] malloc buffer 0x7f0a69026800 with size 1056
[TM][DEBUG] malloc buffer 0x7f0a8d867800 with size 1024
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] malloc buffer 0x7f0a75826e00 with size 262144
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7f0a8d867c00 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = long unsigned int; size_t = long unsigned int]
[TM][DEBUG] malloc buffer 0x7f0a75866e00 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7f0a8d868000 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = bool; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] malloc buffer 0x7f0a75867200 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = bool; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7f08b1c26e00 with size 262144
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] malloc buffer 0x7f0a75867600 with size 512
[TM][DEBUG] malloc buffer 0x7f0a8d868400 with size 512
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = float; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = float; size_t = long unsigned int]
[TM][DEBUG] malloc buffer 0x7f0a69026e00 with size 262144
[TM][DEBUG] malloc buffer 0x7f08b1c66e00 with size 1024
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] malloc buffer 0x7f0a8d868600 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7f0a8d868a00 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7f0a69066e00 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] malloc buffer 0x7f0a75867800 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] malloc buffer 0x7f0a8d868e00 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = bool; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7f0a69067200 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] malloc buffer 0x7f0a8d869200 with size 512
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = float; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7f08b1c67200 with size 1024
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = bool; size_t = long unsigned int]
[TM][DEBUG] malloc buffer 0x7f0a8d869400 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = unsigned int; size_t = long unsigned int]
[TM][DEBUG] malloc buffer 0x7f0a75867c00 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] malloc buffer 0x7f0a8d869800 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = bool; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7f0a69067600 with size 512
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = float; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7f0a75868000 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = bool; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7f08b1c67600 with size 512
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = float; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7ef6a0c00000 with size 8388608
[TM][DEBUG] void turbomind::ftNcclStreamSynchronize(turbomind::NcclParam, turbomind::NcclParam, cudaStream_t) start
[TM][DEBUG] void turbomind::ftNcclStreamSynchronize(turbomind::NcclParam, turbomind::NcclParam, cudaStream_t) stop
[TM][DEBUG] malloc buffer 0x7f0a69067800 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7f08b1c67800 with size 1024
[TM][DEBUG] malloc buffer 0x7f0a75868400 with size 512
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] malloc buffer 0x7f0a69067c00 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] malloc buffer 0x7f0a69068000 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = bool; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = float; size_t = long unsigned int]
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7f0a69068400 with size 512
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = float; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7f0a75868600 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] malloc buffer 0x7f0a69068600 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7f08b1c67c00 with size 1024
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7f0a69068a00 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] malloc buffer 0x7f0a75868a00 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7f0a69068e00 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = bool; size_t = long unsigned int]
[TM][DEBUG] malloc buffer 0x7f0a75868e00 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = bool; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] malloc buffer 0x7f0a75869200 with size 512
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = float; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7f0a75869400 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = unsigned int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7f0a69069200 with size 512
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = float; size_t = long unsigned int]
[TM][DEBUG] malloc buffer 0x7f0a75869800 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7f08b1c68000 with size 1024
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = bool; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7f0a69069400 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = unsigned int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7f08b1c68400 with size 512
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = float; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7ef6a1400000 with size 8388608
[TM][DEBUG] void turbomind::ftNcclStreamSynchronize(turbomind::NcclParam, turbomind::NcclParam, cudaStream_t) start
[TM][DEBUG] malloc buffer 0x7f08b1c68600 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] void turbomind::ftNcclStreamSynchronize(turbomind::NcclParam, turbomind::NcclParam, cudaStream_t) stop
[TM][DEBUG] malloc buffer 0x7f0a69069800 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7f08b1c68a00 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7f08b1c68e00 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = bool; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7ef67cc00000 with size 8388608
[TM][DEBUG] void turbomind::ftNcclStreamSynchronize(turbomind::NcclParam, turbomind::NcclParam, cudaStream_t) start
[TM][DEBUG] malloc buffer 0x7f08b1c69200 with size 512
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = float; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] void turbomind::ftNcclStreamSynchronize(turbomind::NcclParam, turbomind::NcclParam, cudaStream_t) stop
[TM][DEBUG] malloc buffer 0x7f08b1c69400 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = unsigned int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7f08b1c69800 with size 1024
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7ef67d400000 with size 8388608
[TM][DEBUG] void turbomind::ftNcclStreamSynchronize(turbomind::NcclParam, turbomind::NcclParam, cudaStream_t) start
[TM][DEBUG] void turbomind::ftNcclStreamSynchronize(turbomind::NcclParam, turbomind::NcclParam, cudaStream_t) stop
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = float; size_t = long unsigned int]
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = float; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = float; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = float; size_t = long unsigned int]
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7f0a8d869c00 with size 1048576
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = unsigned int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7f08b1c69c00 with size 1048576
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = unsigned int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7f0a69069c00 with size 1048576
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = unsigned int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7f0a75869c00 with size 1048576
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = unsigned int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7ef651c00000 with size 1048576
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = unsigned int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7ef651e00000 with size 1048576
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = unsigned int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7ef41fc00000 with size 1048576
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = unsigned int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] malloc buffer 0x7f0a69169c00 with size 1024
[TM][INFO] LlamaBatch<T>::Start()
[TM][DEBUG] malloc buffer 0x7ef41fe00000 with size 1048576
[TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = unsigned int; size_t = long unsigned int]
[TM][DEBUG] Cannot find buffer (nil), mallocing new one.
[TM][DEBUG] virtual void* turbomind::Allocator<turbomind::AllocatorType::CUDA>::malloc(size_t, bool, bool)
[TM][DEBUG] void turbomind::LlamaV2<T>::forwardUnified(T*, T*, T*, void**, const int*, const int*, const int*, const int*, const float*, const bool*, size_t, int, int, int*, const turbomind::Sequence**) [with T = __nv_bfloat16; size_t = long unsigned int]
[TM][DEBUG] malloc buffer 0x7f08b1d69c00 with size 1024
[TM][INFO] LlamaBatch<T>::Start()
[TM][INFO] [Gemm2] Tuning sequence: 8, 16, 32, 48, 64, 96, 128, 192, 256, 384, 512, 768, 1024, 1536, 2048, 3072, 4096, 6144, 8192
[TM][INFO] [Gemm2] 8
[TM][DEBUG] void turbomind::LlamaV2<T>::forwardUnified(T*, T*, T*, void**, const int*, const int*, const int*, const int*, const float*, const bool*, size_t, int, int, int*, const turbomind::Sequence**) [with T = __nv_bfloat16; size_t = long unsigned int]
[TM][DEBUG] void turbomind::ftNcclAllGather(const T*, T*, int, int, turbomind::NcclParam, cudaStream_t) [with T = __nv_bfloat16; cudaStream_t = CUstream_st*] start
[TM][DEBUG] ncclDataType_t turbomind::getNcclDataType() [with T = __nv_bfloat16] start
[TM][DEBUG] void turbomind::ftNcclAllGather(const T*, T*, int, int, turbomind::NcclParam, cudaStream_t) [with T = __nv_bfloat16; cudaStream_t = CUstream_st*] start
[TM][DEBUG] ncclDataType_t turbomind::getNcclDataType() [with T = __nv_bfloat16] start
[TM][DEBUG] void turbomind::ftNcclAllGather(const T*, T*, int, int, turbomind::NcclParam, cudaStream_t) [with T = __nv_bfloat16; cudaStream_t = CUstream_st*] stop
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: decoder_input
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: output_norm_weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: h_q_len
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: h_k_len
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: finished
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: dc_batch_size
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: pf_batch_size
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: rope_theta
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: cu_block_counts
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: decoder_output
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: block_ptrs
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: last_token_hidden_units
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: decoder_input
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: pf_batch_size
[TM][DEBUG] T turbomind::Tensor::getVal() const [with T = int] start
[TM][DEBUG] T turbomind::Tensor::getVal(size_t) const [with T = int; size_t = long unsigned int] start
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: dc_batch_size
[TM][DEBUG] T turbomind::Tensor::getVal() const [with T = int] start
[TM][DEBUG] T turbomind::Tensor::getVal(size_t) const [with T = int; size_t = long unsigned int] start
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: h_q_len
[TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = int] start
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: h_k_len
[TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = int] start
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: decoder_input
[TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = __nv_bfloat16] start
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: decoder_output
[TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = __nv_bfloat16] start
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: last_token_hidden_units
[TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = __nv_bfloat16] start
[TM][DEBUG] void turbomind::ftNcclAllGather(const T*, T*, int, int, turbomind::NcclParam, cudaStream_t) [with T = __nv_bfloat16; cudaStream_t = CUstream_st*] stop
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: decoder_input
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: output_norm_weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: h_q_len
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: h_k_len
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: finished
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: dc_batch_size
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: pf_batch_size
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: rope_theta
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: cu_block_counts
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: decoder_output
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: block_ptrs
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: last_token_hidden_units
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: decoder_input
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: pf_batch_size
[TM][DEBUG] T turbomind::Tensor::getVal() const [with T = int] start
[TM][DEBUG] T turbomind::Tensor::getVal(size_t) const [with T = int; size_t = long unsigned int] start
[TM][DEBUG] malloc buffer 0x7f0a8d969c00 with size 1024
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: dc_batch_size
[TM][INFO] LlamaBatch<T>::Start()
[TM][DEBUG] T turbomind::Tensor::getVal() const [with T = int] start
[TM][DEBUG] T turbomind::Tensor::getVal(size_t) const [with T = int; size_t = long unsigned int] start
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: h_q_len
[TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = int] start
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: h_k_len
[TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = int] start
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: decoder_input
[TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = __nv_bfloat16] start
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: decoder_output
[TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = __nv_bfloat16] start
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: last_token_hidden_units
[TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = __nv_bfloat16] start
[TM][DEBUG] void turbomind::LlamaV2<T>::forwardUnified(T*, T*, T*, void**, const int*, const int*, const int*, const int*, const float*, const bool*, size_t, int, int, int*, const turbomind::Sequence**) [with T = __nv_bfloat16; size_t = long unsigned int]
[TM][DEBUG] void turbomind::ftNcclAllGather(const T*, T*, int, int, turbomind::NcclParam, cudaStream_t) [with T = __nv_bfloat16; cudaStream_t = CUstream_st*] start
[TM][DEBUG] ncclDataType_t turbomind::getNcclDataType() [with T = __nv_bfloat16] start
[TM][DEBUG] void turbomind::ftNcclAllGather(const T*, T*, int, int, turbomind::NcclParam, cudaStream_t) [with T = __nv_bfloat16; cudaStream_t = CUstream_st*] stop
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: decoder_input
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: output_norm_weight
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: h_q_len
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: h_k_len
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: finished
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: dc_batch_size
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: pf_batch_size
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: rope_theta
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: cu_block_counts
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: decoder_output
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: block_ptrs
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: last_token_hidden_units
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: decoder_input
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: pf_batch_size
[TM][DEBUG] T turbomind::Tensor::getVal() const [with T = int] start
[TM][DEBUG] T turbomind::Tensor::getVal(size_t) const [with T = int; size_t = long unsigned int] start
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: dc_batch_size
[TM][DEBUG] T turbomind::Tensor::getVal() const [with T = int] start
[TM][DEBUG] T turbomind::Tensor::getVal(size_t) const [with T = int; size_t = long unsigned int] start
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: h_q_len
[TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = int] start
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: h_k_len
[TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = int] start
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: decoder_input
[TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = __nv_bfloat16] start
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: decoder_output
[TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = __nv_bfloat16] start
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: last_token_hidden_units
[TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = __nv_bfloat16] start
lzhangzz commented 1 week ago

Maybe related to #2706

github-actions[bot] commented 3 days ago

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

DefTruth commented 2 days ago

遇到了相同的问题,50个实例开TP=2,有几个会卡死,其中一个卡GPU利用率为0,另一个为100%