Athena-I commented 1 year ago

I tried to make the inference on A30, while an error occurred: RuntimeError: CUDA out of memory. How to inference on multi cards?

Stanislas0 commented 1 year ago

Hi! We have released the code for model parallel inference. We also suggest using a quantized version that requires significantly lower memory. Just run the following script:

# On a single GPU (with more than 27GB RAM)
bash ./scripts/test_inference.sh <GPU_ID> ./tests/test_prompt.txt

# With quantization (with more than 15GB RAM)
bash ./scripts/test_inference_quantized.sh <GPU_ID> ./tests/test_prompt.txt

# On multiple GPUs (with more than 6GB RAM, need to first convert ckpt to MP_SIZE partitions)
bash ./scripts/convert_ckpt_parallel.sh <LOAD_CKPT_PATH> <SAVE_CKPT_PATH> <MP_SIZE>
bash ./scripts/test_inference_parallel.sh <MP_SIZE> ./tests/test_prompt.txt

wujianqiangwjq commented 1 year ago

完整的报错 ARNING:torch.distributed.run:

Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

libibverbs not available, ibv_fork_init skipped libibverbs not available, ibv_fork_init skipped W20230407 00:33:39.174242 1141 rpc_client.cpp:190] LoadServer 127.0.0.1 Failed at 0 times error_code 14 error_message Connection reset by peer E0407 00:33:39.190535757 1140 server_chttp2.cc:40] {"created":"@1680827619.190465338","description":"No address added out of total 1 resolved","file":"/home/ci-user/manylinux-cache-dir/release/cu117/build/grpc/src/grpc/src/core/ext/transport/chttp2/server/chttp2_server.cc","file_line":395,"referenced_errors":[{"created":"@1680827619.190463829","description":"Failed to add any wildcard listeners","file":"/home/ci-user/manylinux-cache-dir/release/cu117/build/grpc/src/grpc/src/core/lib/iomgr/tcp_server_posix.cc","file_line":342,"referenced_errors":[{"created":"@1680827619.190445769","description":"Address family not supported by protocol","errno":97,"file":"/home/ci-user/manylinux-cache-dir/release/cu117/build/grpc/src/grpc/src/core/lib/iomgr/socket_utils_common_posix.cc","file_line":420,"os_error":"Address family not supported by protocol","syscall":"socket","target_address":"[::]:29500"},{"created":"@1680827619.190463308","description":"Unable to configure socket","fd":18,"file":"/home/ci-user/manylinux-cache-dir/release/cu117/build/grpc/src/grpc/src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":216,"referenced_errors":[{"created":"@1680827619.190459051","description":"Address already in use","errno":98,"file":"/home/ci-user/manylinux-cache-dir/release/cu117/build/grpc/src/grpc/src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":189,"os_error":"Address already in use","syscall":"bind"}]}]}]} F20230407 00:33:39.190577 1140 rank_info_bootstrap_server.cpp:46] Check failed: p == port() (29500 vs. 0) Port 29500 is unavailable Check failure stack trace: @ 0x7fb9794709ca google::LogMessage::Fail() @ 0x7fb979470cb2 google::LogMessage::SendToLog() @ 0x7fb979470537 google::LogMessage::Flush() @ 0x7fb9794730a9 google::LogMessageFatal::~LogMessageFatal() @ 0x7fb96ed95d40 oneflow::RankInfoBootstrapServer::RankInfoBootstrapServer() @ 0x7fb96ed7fd15 oneflow::RankInfoCtrlBootstrap::RankInfoCtrlBootstrap() @ 0x7fb97345f05b oneflow::GrpcRpcManager::Bootstrap() @ 0x7fb972bf91c1 oneflow::EnvGlobalObjectsScope::Init() @ 0x7fb972bfbc94 oneflow::EnvGlobalObjectsScope::EnvGlobalObjectsScope() @ 0x7fba1b873af9 (unknown) @ 0x7fba1b80fe0d (unknown) @ 0x5649a0e70e14 cfunction_call @ 0x5649a0e2acaf _PyObject_MakeTpCall @ 0x5649a0da505b method_vectorcall.cold.2469 @ 0x5649a0e34a7a _PyObject_Call @ 0x5649a0d9aad9 slot_tp_init.cold.2212 @ 0x5649a0e45e9b type_call @ 0x7fbac4339bf9 pybind11_meta_call @ 0x5649a0e2acaf _PyObject_MakeTpCall @ 0x5649a0ec8d89 _PyEval_EvalFrameDefault @ 0x5649a0e86284 _PyFunction_Vectorcall @ 0x5649a0dee755 _PyEval_EvalFrameDefault.cold.2984 @ 0x5649a0e86284 _PyFunction_Vectorcall @ 0x5649a0decae6 _PyEval_EvalFrameDefault.cold.2984 @ 0x5649a0e86284 _PyFunction_Vectorcall @ 0x5649a0e70eca _PyObject_FastCallDictTstate @ 0x5649a0e7ab79 slot_tp_init @ 0x5649a0e2ad5f _PyObject_MakeTpCall @ 0x5649a0ec480a _PyEval_EvalFrameDefault @ 0x5649a0e86284 _PyFunction_Vectorcall @ 0x5649a0dee755 _PyEval_EvalFrameDefault.cold.2984 @ 0x5649a0e85663 _PyEval_EvalCode WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1141 closing signal SIGTERM ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 0 (pid: 1140) of binary: /usr/local/miniconda/bin/python Traceback (most recent call last): File "/usr/local/miniconda/bin/torchrun", line 8, in sys.exit(main()) File "/usr/local/miniconda/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper return f(*args, kwargs) File "/usr/local/miniconda/lib/python3.9/site-packages/torch/distributed/run.py", line 794, in main run(args) File "/usr/local/miniconda/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run elastic_launch( File "/usr/local/miniconda/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in call** return launch_agent(self._config, self._entrypoint, list(args)) File "/usr/local/miniconda/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

/CodeGeeX/tests/test_inference_megatron.py FAILED

Failures:

----------------------------------------------------- Root Cause (first observed failure): [0]: time : 2023-04-07_00:33:41 host : 2c00174f74e6 rank : 0 (local_rank: 0) exitcode : -6 (pid: 1140) error_file: traceback : Signal 6 (SIGABRT) received by PID 1140 =====================================================

THUDM / CodeGeeX

How to inference on multi-gpu? #12

/CodeGeeX/tests/test_inference_megatron.py FAILED