alibaba / rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
Apache License 2.0
544 stars 50 forks source link

[ROCm] refine quantization related code #99

Closed feifei14119 closed 3 months ago

feifei14119 commented 3 months ago
feifei14119 commented 3 months ago

Qwen2-7B-Instruct-GPTQ-Int4 test result:

["Hello! I'm Qwen, a state-of-the-art language model developed by Alibaba Cloud. My main purpose is to generate human-like text and assist with various types of questions and tasks. How can I assist you today?"]
{
    "id": "chat-",
    "object": "chat.completion",
    "created": 1723028790,
    "model": "AsyncModel",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "我是阿里云开发的一款超大规模语言模型,我叫通义千问。作为一个助手,我的目标是帮助用户获得准确、有用的信息,解决他们的问题或完成任务。我可以回答各种问题、提供代码实现、构思创作、甚至是进行闲聊。如果你有任何需要帮助的地方,尽管告诉我,我会尽力提供支持。",
                "function_call": null,
                "tool_calls": null
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 22,
        "total_tokens": 91,
        "completion_tokens": 69
    },
    "debug_info": null,
    "aux_info": null
}
[FT][WARNING][RANK 0][139788025853504][24-08-07 11:06:30] max_top_k: 1, max_top_p: 1.000000
[FT][WARNING][RANK 0][139788025853504][24-08-07 11:06:30] topk_ws_size: 608320, topp_ws_size: 2433280
[FT][WARNING][RANK 0][139788025853504][24-08-07 11:06:30] max_top_k: 1, max_top_p: 1.000000
[FT][WARNING][RANK 0][139788025853504][24-08-07 11:06:30] topk_ws_size: 608320, topp_ws_size: 2433280
[FT][INFO][RANK 0][139830970417600][24-08-07 11:06:30] destory normal engine
[FT][INFO][RANK 0][139830970417600][24-08-07 11:06:30] stop normal engine
[FT][INFO][RANK 0][139830970417600][24-08-07 11:06:30] stop FIFOScheduler
[FT][INFO][RANK 0][139830970417600][24-08-07 11:06:30] stop FIFOScheduler
[FT][INFO][RANK 0][139830970417600][24-08-07 11:06:30] destory FIFOScheduler
2024-08-07 11:06:31.766029 INFO kmonitor.MetricsSystem : remove source name [_1_reporter], current source size [1]
2024-08-07 11:06:31.766129 INFO kmonitor.KMonitor : release kmonitor [_1_reporter].
================================================================================
Target //example:test up-to-date:
  bazel-bin/example/test
INFO: Elapsed time: 64.608s, Critical Path: 63.52s
INFO: 2 processes: 2 local.
INFO: Build completed successfully, 2 total actions
//example:test
feifei14119 commented 3 months ago

rocm gemm_op_test result:

[----------] Global test environment tear-down
[==========] 6 tests from 1 test suite ran. (8194 ms total)
[  PASSED  ] 6 tests.
================================================================================
Target //src/fastertransformer/devices/rocm_impl/test:gemm_op_test up-to-date:
  bazel-bin/src/fastertransformer/devices/rocm_impl/test/gemm_op_test
INFO: Elapsed time: 67.519s, Critical Path: 66.29s
INFO: 12 processes: 5 internal, 7 local.
INFO: Build completed successfully, 12 total actions
//src/fastertransformer/devices/rocm_impl/test:gemm_op_test              PASSED in 20.1s

Executed 1 out of 1 test: 1 test passes.