InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
4.11k stars 373 forks source link

[Bug] output diff when temperature set zero #1688

Open zhyncs opened 3 months ago

zhyncs commented 3 months ago

Checklist

Describe the bug

I used the latest code from LMDeploy to run the vicuna 13b model. The client used temperature 0 and made two requests, resulting in some differences in the output.

Theoretically, when the temperature is 0, multiple requests should yield consistent results without any differences.

Reproduction

def text_to_sha256(text): text_bytes = text.encode('utf-8') sha256_hash = hashlib.sha256() sha256_hash.update(text_bytes) hash_digest = sha256_hash.hexdigest() return hash_digest


```python3
    def _inference(self, req_queue: Queue, res_queue: Queue, session_id: int,
                   stream_output: bool):

        stats = []
        client = APIClient(self.server_addr, api_key=self.api_key)

        for prompt, input_seqlen, output_seqlen in iter(
                req_queue.get, [None, None, None]):
            timestamps = []
            timestamps.append(time.perf_counter())
            for output in client.chat_completions_v1(
                    model=self.model_name,
                    messages=prompt,
                    temperature=self.temperature,
                    top_p=self.top_p,
                    n=1,
                    max_tokens=output_seqlen,
                    stream=stream_output,
                    session_id=session_id,
                    ignore_eos=True):
                timestamps.append(time.perf_counter())
            print(text_to_sha256(output['choices'][0]['message']['content']))

client first try

python3 benchmark/profile_restful_api.py 127.0.0.1:23333 /workdir/vicuna-13b-v1.3 /workdir/ShareGPT_V3_unfiltered_cleaned_split.json --concurrency 128 --num_prompts 128 --temperature 0 | tee first

client second try

python3 benchmark/profile_restful_api.py 127.0.0.1:23333 /workdir/vicuna-13b-v1.3 /workdir/ShareGPT_V3_unfiltered_cleaned_split.json --concurrency 128 --num_prompts 128 --temperature 0 | tee second

only keep the hash results and sort

sort first -o first_output sort second -o second_output

compare and find the diff

icdiff first_output second_output


### Environment

```Shell
sys.platform: linux
Python: 3.9.16 (main, Aug 15 2023, 19:38:56) [GCC 8.3.1 20190311 (Red Hat 8.3.1-3)]
CUDA available: True
GPU 0,1: NVIDIA A100-SXM4-80GB
GCC: gcc (GCC) 10.2.1 20210130 (Red Hat 10.2.1-11)
PyTorch: 2.3.0+cu118
LMDeploy: 0.4.2+e6468e7
triton: 2.3.0

Error traceback

No response

zhyncs commented 3 months ago
diff first_output second_output -y -W 196
031a4481d6cac164b685924ec9a83ce7434f9d0b1596cf15185dab47404d89c7                031a4481d6cac164b685924ec9a83ce7434f9d0b1596cf15185dab47404d89c7
07ef14b39226a8d37b147398eb9ffd18b162a5a5b81602f9722ef502245e2012                07ef14b39226a8d37b147398eb9ffd18b162a5a5b81602f9722ef502245e2012
0abff593983bc1b4f2f8bbe6a895dd7f7726d1c9a666fb6589031b7c0c5c3d27                0abff593983bc1b4f2f8bbe6a895dd7f7726d1c9a666fb6589031b7c0c5c3d27
0ba761db6f0f518a72cf7e2d8bca1c507be1b3f28c2e4db00a9ed0f956755fe0                0ba761db6f0f518a72cf7e2d8bca1c507be1b3f28c2e4db00a9ed0f956755fe0
0cb84eda5915f334a959932320ac06b9cc1d4ed65ee1b7edcf0bbf0a44b26329                0cb84eda5915f334a959932320ac06b9cc1d4ed65ee1b7edcf0bbf0a44b26329
                                                  > 0d01b41b147fc77f14dc5e57d4d1f772bc35be7668bc180fcf0ff43c1df00349
149f0fdc3b1e898c5a6598792a0b2315879b1b724237a8f50d9d2381e92e47e0                149f0fdc3b1e898c5a6598792a0b2315879b1b724237a8f50d9d2381e92e47e0
1b04b7179897553c8a092683a19eae08edc762d1d11816a5de7d909e2ebd560a                1b04b7179897553c8a092683a19eae08edc762d1d11816a5de7d909e2ebd560a
1db3da6956a88ff3a7adc029d08cc2855773c7fb8430389d29664bb285e0497f                  <
2048ee609e15ed69cad004d03371405a673599b64b29cc628564a5dc38d50227                2048ee609e15ed69cad004d03371405a673599b64b29cc628564a5dc38d50227
20fde8ec4db734a77110c988e5673cf773abdeb07b784d39aecdd86dc54a5101                  <
230fcac0c49b0051a4a37afb867d0683650b04ce737fb95dc2b6fae338c2a6ea                230fcac0c49b0051a4a37afb867d0683650b04ce737fb95dc2b6fae338c2a6ea
234906550527527b0956205333ed82aab24a5b10beb1c98ea3c7a612dd717a92                234906550527527b0956205333ed82aab24a5b10beb1c98ea3c7a612dd717a92
25857ddbf8aae4b395bc6c131ef7bd1a4ea119007537a89ac7995b00ffb4656a                25857ddbf8aae4b395bc6c131ef7bd1a4ea119007537a89ac7995b00ffb4656a
26b4b5b395c56db3975d4fdb53f9bf8e2258e6a157878286a9a96a52588a5bf0                26b4b5b395c56db3975d4fdb53f9bf8e2258e6a157878286a9a96a52588a5bf0
27191762fbe72e5eef3a8d8c9462329eb0bd4f5799168b1556852d95f3d85382                27191762fbe72e5eef3a8d8c9462329eb0bd4f5799168b1556852d95f3d85382
28088e0e8d1e2e50602c8a09aced7eb8525794a11ec7fcb62a214afbe2772b7d                28088e0e8d1e2e50602c8a09aced7eb8525794a11ec7fcb62a214afbe2772b7d
28b445aa51d40d69a6ad3523efd59a5add3438c9c5c1ebf88162530e74d87df5                28b445aa51d40d69a6ad3523efd59a5add3438c9c5c1ebf88162530e74d87df5
2a57b9202c02e03d4a58e0e87f1b1a5690afd48f06798687c04a45ff791dfce6                2a57b9202c02e03d4a58e0e87f1b1a5690afd48f06798687c04a45ff791dfce6
2ba30d617df39b92aefc59e34c046a505f417d192b8652a408b7200c1891e97f                2ba30d617df39b92aefc59e34c046a505f417d192b8652a408b7200c1891e97f
2bec9d38b2b52085ddd85b1e0305644e067acd5edfd4ced3db599303666fcbc2                2bec9d38b2b52085ddd85b1e0305644e067acd5edfd4ced3db599303666fcbc2
2cf7081ad598b91b7f51ec4f07e4601624774d7adae671361fe14aa2c667c3e3                2cf7081ad598b91b7f51ec4f07e4601624774d7adae671361fe14aa2c667c3e3
301c1cb3c4c40f89c7f00cd990e0f08c53a0766d87aac4ff460db5934b091cf0                301c1cb3c4c40f89c7f00cd990e0f08c53a0766d87aac4ff460db5934b091cf0
316f41e12813cd8744dc1e7d5704fe9fe8e8a83b8f77619ec710bba50bdbaf9f                316f41e12813cd8744dc1e7d5704fe9fe8e8a83b8f77619ec710bba50bdbaf9f
377828d8a85cb60f2688816786cdf3ebda2c10f6886838ad3023b0a93da695fd                377828d8a85cb60f2688816786cdf3ebda2c10f6886838ad3023b0a93da695fd
38439161677a63490a1e93b5d61150a7504375cbac4fc528d10acaac87de042f                38439161677a63490a1e93b5d61150a7504375cbac4fc528d10acaac87de042f
3c0c9c04132619f0e6217bc0f4064f88a001ce3b8ba18181b0f8fc4bd9468322                3c0c9c04132619f0e6217bc0f4064f88a001ce3b8ba18181b0f8fc4bd9468322
3d2341b11b6bcb75cb20896af06036768b87887404597ac1a6d1e164f0d9e6b5                  <
3effb4544508a673f6729a25463e8bbd99f72111070619c5d86b7d7187b034cc                3effb4544508a673f6729a25463e8bbd99f72111070619c5d86b7d7187b034cc
40aa32495beefd79b444227fe259c1a5ea72adebc84be01fbbd7ef74068343b9                40aa32495beefd79b444227fe259c1a5ea72adebc84be01fbbd7ef74068343b9
4284bb3d5751309680dd6304d7fa54d2a1f4380f6dbb6523298f67dfd05fcf95                4284bb3d5751309680dd6304d7fa54d2a1f4380f6dbb6523298f67dfd05fcf95
43d1894d8c9cff0259a93d8124f2f43640abe1166dff7e28ff12e1443c7b4a24                43d1894d8c9cff0259a93d8124f2f43640abe1166dff7e28ff12e1443c7b4a24
447864d1ffc54ce7a41056df0d32310c88066cf28c760599f50e5bb0dde0f576                447864d1ffc54ce7a41056df0d32310c88066cf28c760599f50e5bb0dde0f576
468f8f800ef08a899a828bb8f2fe7784b4d37f43335096823c1168fd26995fe8                468f8f800ef08a899a828bb8f2fe7784b4d37f43335096823c1168fd26995fe8
46983e97c22b26deb9c37e6ca5bf373496b0d97339430b94c7cdc8c5218a2f96                46983e97c22b26deb9c37e6ca5bf373496b0d97339430b94c7cdc8c5218a2f96
46a0363a5ac2b79beeffead21c0cb8a1a270c9cef1b88ec61b1359f00ede629d                46a0363a5ac2b79beeffead21c0cb8a1a270c9cef1b88ec61b1359f00ede629d
47bc080d06b13c9ba37939bafce80e62104dbbe904c95655f98a0b758b32c4d0                47bc080d06b13c9ba37939bafce80e62104dbbe904c95655f98a0b758b32c4d0
49078fcbee4d2616e81d345d873ef815fdc6369b1663e72d7019083a530ccad4                49078fcbee4d2616e81d345d873ef815fdc6369b1663e72d7019083a530ccad4
4a82caf344afdb3f663bbfee990d0b60135b0ccfb8775e130f943017d7a38c3b                4a82caf344afdb3f663bbfee990d0b60135b0ccfb8775e130f943017d7a38c3b
                                                  > 4c7fd937fd29d36aeaa9aa4bdaaa6b0c41635ad46eed937f5215096862f8a78b
4d3e0be8d5cb0e9c641160bb382eb5c4785cd2c28363ba6c6515a308e56ccd20                4d3e0be8d5cb0e9c641160bb382eb5c4785cd2c28363ba6c6515a308e56ccd20
4df7a05e500dda30683a8e8a79d6a2d34530b283bd0186f8472076673a254535                4df7a05e500dda30683a8e8a79d6a2d34530b283bd0186f8472076673a254535
4e7a13d645d3d3a495b1dc25a58b4fa8c818dd4bf785a99007bad54275182b7e                  <
516c9949eb61f3d2e653641476e662ccecf4d308f11f97abc2552a9582ad5f90                516c9949eb61f3d2e653641476e662ccecf4d308f11f97abc2552a9582ad5f90
5292e987b0c297f3e3455cd7fb718fc68ed0d2b995de43990cc3666f7ee8e082                5292e987b0c297f3e3455cd7fb718fc68ed0d2b995de43990cc3666f7ee8e082
543a1fff0da8b16251d1e09bc27feee81efb6d9bb9ad45ca09efce9fd59f0263                543a1fff0da8b16251d1e09bc27feee81efb6d9bb9ad45ca09efce9fd59f0263
54f6ce1fd762b2f5ab1078b39cdc346c261eb6379c5e7a8a23e155f14d3c4ef2                54f6ce1fd762b2f5ab1078b39cdc346c261eb6379c5e7a8a23e155f14d3c4ef2
5939aa03df2bf97bb195e6614fce571abed3e0350c034333e7b2188d3824331c                5939aa03df2bf97bb195e6614fce571abed3e0350c034333e7b2188d3824331c
59a81a893fe130d1755e5155a4ab5cf51fe1a9f6d9feeffadb85e6f06ba4fa1a                59a81a893fe130d1755e5155a4ab5cf51fe1a9f6d9feeffadb85e6f06ba4fa1a
5ac2685bcfc239cfaeaa2be8a35a0794c7f565458241d9c4f57e2c5ef4fd0c21                5ac2685bcfc239cfaeaa2be8a35a0794c7f565458241d9c4f57e2c5ef4fd0c21
5c9c859c79c9ef489a4c1602bb9d17520e21fc03c9054a4d9ea7bdbfe703f853                5c9c859c79c9ef489a4c1602bb9d17520e21fc03c9054a4d9ea7bdbfe703f853
5ce65626c6c07b2c92e1ab4b01725fb6e9f6657952bfb11c002214bebf20b019                5ce65626c6c07b2c92e1ab4b01725fb6e9f6657952bfb11c002214bebf20b019
5d333cf8acafa13a99e6c3b5e1eaea0b5dd0519b4d9c2239165161f7f7e994da                5d333cf8acafa13a99e6c3b5e1eaea0b5dd0519b4d9c2239165161f7f7e994da
62504579d424a363462413f220d901bd4408870641d8042016de77bf5d08ab82                62504579d424a363462413f220d901bd4408870641d8042016de77bf5d08ab82
6303b04ed92d83084718996619c713e80590e5ceb78352670b8191a203828dbe                6303b04ed92d83084718996619c713e80590e5ceb78352670b8191a203828dbe
63513da98ab64df83c56bd5c08cb4bc8d21c657e2a488cdaf46cf27e8504b577                63513da98ab64df83c56bd5c08cb4bc8d21c657e2a488cdaf46cf27e8504b577
69debef3c26a21e7f6d144a7e5f4450754a7d72361e21ca0ffdf482560b4d8ff                69debef3c26a21e7f6d144a7e5f4450754a7d72361e21ca0ffdf482560b4d8ff
6a553809eac82a354e9c6d9c19f0b9ec9a70d06c7a6ffe8fca808d45b9e17866                6a553809eac82a354e9c6d9c19f0b9ec9a70d06c7a6ffe8fca808d45b9e17866
6aa92921317dc06ea93402b6c210825bb7eeda13120fe82cedaaeff2718da393                6aa92921317dc06ea93402b6c210825bb7eeda13120fe82cedaaeff2718da393
6c247cabec7cfee009f217c888741c16a9481f6711a18d83d1816cc71d09c8d6                6c247cabec7cfee009f217c888741c16a9481f6711a18d83d1816cc71d09c8d6
7028b5d4998d4640bbdc849eddb11c0a060057e22597fe31e0a5db19dc244263                7028b5d4998d4640bbdc849eddb11c0a060057e22597fe31e0a5db19dc244263
713d3b6ada69a2564f2bafe86c4ac1dde54068364301e776180c3062fcec813f                713d3b6ada69a2564f2bafe86c4ac1dde54068364301e776180c3062fcec813f
7311ddb901cb86b388c03f3c300a0d757151089688eb21df4dfb513fd187fa55                7311ddb901cb86b388c03f3c300a0d757151089688eb21df4dfb513fd187fa55
754a5ef75e09b0423ce54cad6228bf4c98b5bbb61044ef614451594fd5a2bc80                754a5ef75e09b0423ce54cad6228bf4c98b5bbb61044ef614451594fd5a2bc80
758a37bad4ae44d5618bec8e5bf4b339ed6c048a289b512f6f82abf98184cd26                758a37bad4ae44d5618bec8e5bf4b339ed6c048a289b512f6f82abf98184cd26
7902f8d25307efc1622d98b66de0021d9a2e6c2501482af8bfafb5c68c909221                7902f8d25307efc1622d98b66de0021d9a2e6c2501482af8bfafb5c68c909221
79323d579cc360af3ffcd220edbf8689f06bc8e0c01b70e72854ba16b67784bc                79323d579cc360af3ffcd220edbf8689f06bc8e0c01b70e72854ba16b67784bc
7a1b8e41e5317e0035febe747687c3bb45e30eb5fdcf6731ef87eed5f25691f2                7a1b8e41e5317e0035febe747687c3bb45e30eb5fdcf6731ef87eed5f25691f2
7c3b5f5ebf4acd8f2376d4cff5abc210eceee8f54764ec4b5539b9e2a0f086db                7c3b5f5ebf4acd8f2376d4cff5abc210eceee8f54764ec4b5539b9e2a0f086db
7f10157fee9c4726f808acb613a7d5804d4a05591f9a11a72d1f4fcef3e824a9                7f10157fee9c4726f808acb613a7d5804d4a05591f9a11a72d1f4fcef3e824a9
7f27f05baa137865b7eddcc6989fae7381607480f1a312c609f16ce86f4dcc6b                7f27f05baa137865b7eddcc6989fae7381607480f1a312c609f16ce86f4dcc6b
7f7d6b054ba4f7e694630247c8a8fcd81879528c80bfdaa7ea7a8c5c7bbb0db4                7f7d6b054ba4f7e694630247c8a8fcd81879528c80bfdaa7ea7a8c5c7bbb0db4
8025565b08129645b99a2fa1d57a957e7ae90e30907ce358025bf4fc3ae76c1e                8025565b08129645b99a2fa1d57a957e7ae90e30907ce358025bf4fc3ae76c1e
805879cbe8fe80f267a7f6088ae6d1e8204341e226e87d848001808d7b30f677                805879cbe8fe80f267a7f6088ae6d1e8204341e226e87d848001808d7b30f677
827a04c7254512fdd7a91e3d019c192dbbfa54ea63225188f6d7868f4063fe36                827a04c7254512fdd7a91e3d019c192dbbfa54ea63225188f6d7868f4063fe36
849cd35b6f8dd84ba76253d946633cf0a40db1483f61462c1e75bbcf7f5df747                849cd35b6f8dd84ba76253d946633cf0a40db1483f61462c1e75bbcf7f5df747
84d900fab212b78fbfeed3cbe6f16fcdf04200ae503a53929c5fd6eff6c40e0d                84d900fab212b78fbfeed3cbe6f16fcdf04200ae503a53929c5fd6eff6c40e0d
87aecfdf18dc0a7e5e4fd037addb00dd071997661c85cd8a91c473d1cf54d500                87aecfdf18dc0a7e5e4fd037addb00dd071997661c85cd8a91c473d1cf54d500
                                                  > 88ac8e9198540d095080ebddb90af8e73aa0d8eca4c6619c10ac418d6e3d5c4f
89127b213b8cd2d557cdde875972678ee93cf704f748791eea944c2349db9c1d                89127b213b8cd2d557cdde875972678ee93cf704f748791eea944c2349db9c1d
8ce684393607ce522c68c1ff9a5de786d8d2b2348741c242f1b6d329da15ea62                8ce684393607ce522c68c1ff9a5de786d8d2b2348741c242f1b6d329da15ea62
8d9fca0dad4b5b2f95c5b7c1aed53d9e3faf8ec5020b7fe3afa34df6011a8f65                8d9fca0dad4b5b2f95c5b7c1aed53d9e3faf8ec5020b7fe3afa34df6011a8f65
9222a7bdf6ba61dec2984e01f58b09eab58785135f096ad5ed47343c511f0c95                9222a7bdf6ba61dec2984e01f58b09eab58785135f096ad5ed47343c511f0c95
92efb3dc92544d31b3b82bdcd205c0ae4012dda5a22d945d3ffa4651e4291582                92efb3dc92544d31b3b82bdcd205c0ae4012dda5a22d945d3ffa4651e4291582
95b6f04ecb4da42c596f63fcec04b931161362777c770bf15b584ead8eeb06a4                95b6f04ecb4da42c596f63fcec04b931161362777c770bf15b584ead8eeb06a4
                                                  > 95dceb6be84f4ea60606b94084cc43e31c28b99539d976531ba696fc3e994756
97e89a88cb11bda32d4b8e4268fd04816faf7e9917c633b3c6412ffbfb1e39da                97e89a88cb11bda32d4b8e4268fd04816faf7e9917c633b3c6412ffbfb1e39da
97ff6e5b2a6428e8b91ec770b1c9b6c78dc6589dffcb15775d5d5f817e8fbbb5                97ff6e5b2a6428e8b91ec770b1c9b6c78dc6589dffcb15775d5d5f817e8fbbb5
99fe6fedf316e4a28cc11c72906f31cdae49b3f45d9c111e9e810d0050e3d21c                99fe6fedf316e4a28cc11c72906f31cdae49b3f45d9c111e9e810d0050e3d21c
                                                  > 9c50befb2432d441e46b0e9ac2d9982e8cf58bf594b87e2ecaa3231c6c3b8263
9ce82d2e004c52879c77b2e32beb616e7f49f990fcc1f58b420093b63b14c1b4                9ce82d2e004c52879c77b2e32beb616e7f49f990fcc1f58b420093b63b14c1b4
a168f91a9657d6eb84928e57028cb21ba71d0c30a26424874a4dcfa04b168c1d                a168f91a9657d6eb84928e57028cb21ba71d0c30a26424874a4dcfa04b168c1d
a2e7c97758900dc3894331c88bc9cea81c44ec762275d7624d315d18346438b5                a2e7c97758900dc3894331c88bc9cea81c44ec762275d7624d315d18346438b5
a567eb0be778b9f4a690c23ffe02d9a012eb0f1fd20e7a19274dddc3691333dc                a567eb0be778b9f4a690c23ffe02d9a012eb0f1fd20e7a19274dddc3691333dc
a9051835620674a58010c23e3b001d37d0577ea8a286855892a529b9a800e286                a9051835620674a58010c23e3b001d37d0577ea8a286855892a529b9a800e286
ab7104f28f35f88500886c292779af8c17a72e0f9bf73f7b74f2ca0c6e841f82                ab7104f28f35f88500886c292779af8c17a72e0f9bf73f7b74f2ca0c6e841f82
ab9dbd897cf0e28b27e267d84ff7095ff7184f65a1a86546364708008be1f268                ab9dbd897cf0e28b27e267d84ff7095ff7184f65a1a86546364708008be1f268
ac080d92c8cb457cf3b917e1723360fbadc7b57e1a6fedcd85b5b04cf2b64fb6                ac080d92c8cb457cf3b917e1723360fbadc7b57e1a6fedcd85b5b04cf2b64fb6
ad93f62ef5c5927dff18f8df561f60c3d98fa081bab080876c55314a07e74aa0                ad93f62ef5c5927dff18f8df561f60c3d98fa081bab080876c55314a07e74aa0
b0739c148b593a76aace9f1634b7ae72cc395e936b4012147d23564d20a4e66a                  | b1483c104937f1755d1da37490021ce4d36ef83b62d2052b470cfbf0a16e9949
b107afa03a7da448d931612e7973c01a3352f4d83d53dba715e302b276b2101f                  <
b1b595bd6223639909b77d50dae7ab2978a6d604a70e4197fa224cc9dc127f31                b1b595bd6223639909b77d50dae7ab2978a6d604a70e4197fa224cc9dc127f31
b26c4db7a04d67622df866ba97db082f76afb44298dd0f17480b0c6701aa27bf                b26c4db7a04d67622df866ba97db082f76afb44298dd0f17480b0c6701aa27bf
b3ef658409b306e413f51731790668ded4833759be7beddc1ebf2c94ad4d87e4                b3ef658409b306e413f51731790668ded4833759be7beddc1ebf2c94ad4d87e4
b5e7199146ce76509bf55140a29dd41dc735edac2243aec7ceeff96341106ce5                b5e7199146ce76509bf55140a29dd41dc735edac2243aec7ceeff96341106ce5
b6674febebdb881e2cab0fb0d0bd72ebc8b3b611d2732e89738a59766ed38abc                b6674febebdb881e2cab0fb0d0bd72ebc8b3b611d2732e89738a59766ed38abc
b71ec0a4e9a64bf366a13670510878fea2d5bf45f66ebd0f2fd107baa7e3097f                b71ec0a4e9a64bf366a13670510878fea2d5bf45f66ebd0f2fd107baa7e3097f
b73baafeac64a3c0ee4c78847c5921660ccc572b3b0d9238eda528a4a2adcf7a                b73baafeac64a3c0ee4c78847c5921660ccc572b3b0d9238eda528a4a2adcf7a
b74656ed3e66a375c0d10b8fcf9b80a12251356cdcffd132c8d71623a3b71d06                  <
ba33a08b44c5afd413e5960c8456e249eaa3fd0d1c93bb46bb6eef29781f8389                ba33a08b44c5afd413e5960c8456e249eaa3fd0d1c93bb46bb6eef29781f8389
bcbc723a6b873bd3041c87315cc16024f83784fb5b1ce71e42c32d836676f3c8                  | bc45ee2a04e6db497ffb2a7a97519e2e498e59cf8b54f1b4e50da2a9dcec6569
be4060601a59ec9bdada5c8dc2de381a79ee3f5a2dc1fca613fbfa9bf2fa5ff4                be4060601a59ec9bdada5c8dc2de381a79ee3f5a2dc1fca613fbfa9bf2fa5ff4
                                                  > c4017455a394358e90456e987711aa6fffa7332e0014781d317d068a940ade8b
c49cc4e7c4a673c453a03d28583029c67187be7437679a06649046fce13aeb17                c49cc4e7c4a673c453a03d28583029c67187be7437679a06649046fce13aeb17
c549fc1dfb90b1bfeb308e92f34d270a1e3850bb71162476cfd44bae5479d323                c549fc1dfb90b1bfeb308e92f34d270a1e3850bb71162476cfd44bae5479d323
ca054b34ba3c6cc6d01c378a2affc636fa3c8c82afd597189e2dfba29dbcd363                ca054b34ba3c6cc6d01c378a2affc636fa3c8c82afd597189e2dfba29dbcd363
ca5ce29a0aa547f556080555f582663d7d068487c0f720f846c530a49eb5c852                  | cdded1e64f935f7d8ad6a1f1514dabea849e35692933b7a91ca22e799b209310
cfec02b6f4224de18287276c3e42337c9241de55f59d62354693d7b1063fdaf7                cfec02b6f4224de18287276c3e42337c9241de55f59d62354693d7b1063fdaf7
d104b0b059fd06cb05b61181c1379e5af6b88e005e23ec290ecd6e548475b9c1                d104b0b059fd06cb05b61181c1379e5af6b88e005e23ec290ecd6e548475b9c1
d40e93849bbf910a49195ed31f7efa08be71ef2319c60945ee93b00eb8556414                d40e93849bbf910a49195ed31f7efa08be71ef2319c60945ee93b00eb8556414
dd5e3b451f7d5a2e50517afdcf50630b55578cdb2288c48e6edb25a929b68a79                dd5e3b451f7d5a2e50517afdcf50630b55578cdb2288c48e6edb25a929b68a79
df7443e73d7ff0c59e972cda1ed6788279a9b0f44a93779ff8924e5ea3ced452                df7443e73d7ff0c59e972cda1ed6788279a9b0f44a93779ff8924e5ea3ced452
e26f33a7d3f3b441489221b9d6ded1355fba6425b0fa35420749f587102aec25                e26f33a7d3f3b441489221b9d6ded1355fba6425b0fa35420749f587102aec25
ea794bbe2ae81d4adf29bbee1d5abbb338c1b380ecff68b4d35596d79bbf46d3                ea794bbe2ae81d4adf29bbee1d5abbb338c1b380ecff68b4d35596d79bbf46d3
eb6a6189de98809590644ec8095114f02050b7cf060e079be5965a0041f692d9                eb6a6189de98809590644ec8095114f02050b7cf060e079be5965a0041f692d9
                                                  > ec7e1be14ac79489a035a02049fed61b622ad54f6fe2b397edbb1f48d85adcf3
efa36b890504d72dfc7b5ba3540478dcc4743cf745a88849760864f7bb4ea3c2                efa36b890504d72dfc7b5ba3540478dcc4743cf745a88849760864f7bb4ea3c2
f2476ebacedaec854c28a735a8b1931e250c6c1b02728d956a9877d722c09c6c                f2476ebacedaec854c28a735a8b1931e250c6c1b02728d956a9877d722c09c6c
f3588a811cc01078085c10dc9ac30955175d21b13a7c8ab2fc79ee8a50dc9e47                f3588a811cc01078085c10dc9ac30955175d21b13a7c8ab2fc79ee8a50dc9e47
f7d26687048c1ed6ada05f36f3669749391845df44fc45660dc935c571b1ca55                  <
f7eeb01730b30149ce94b407de7f03857d1b9cf0adb11d6811e7f4ff8f0bab5f                f7eeb01730b30149ce94b407de7f03857d1b9cf0adb11d6811e7f4ff8f0bab5f
f949703ee62662a6723ed4b87cd84b6e9a77065e09ae1012af85b0e7d5b7dfe3                f949703ee62662a6723ed4b87cd84b6e9a77065e09ae1012af85b0e7d5b7dfe3
fb48f3c943ecc10ca814d582b38b282180e6cd2a2628c800056e46b3dd925380                fb48f3c943ecc10ca814d582b38b282180e6cd2a2628c800056e46b3dd925380
fcb7b3056ebff9a845af9dc9f9637fb1e993be01860b5d70bc63062bac0f8cc6                fcb7b3056ebff9a845af9dc9f9637fb1e993be01860b5d70bc63062bac0f8cc6
ff53090f3dd1870c51829040b206c97eb0ab517d76e6fc03d08ecd77899af57b                ff53090f3dd1870c51829040b206c97eb0ab517d76e6fc03d08ecd77899af57b

diff rate: 10/128=7.8%

lzhangzz commented 3 months ago

You may need to set top_k = 1 and a fixed random seed to stop sampling because we don't have an alternative argmax kernel to bypass sampling.

Also, as split-kv is taking effect automatically, variable batch size and sequence length at runtime may result in different split-kv factor. This will lead to differnt accumulation order and thus differnt outcome.

zhyncs commented 3 months ago

You may need to set top_k = 1 and a fixed random seed to stop sampling because we don't have an alternative argmax kernel to bypass sampling.

In the benchmark/profile_restful_api.py, the seed is default 0 and top_p is default 1. https://github.com/InternLM/lmdeploy/blob/21be189628d064931edc5f166ee00f61c8fb1412/benchmark/profile_restful_api.py#L226 https://github.com/InternLM/lmdeploy/blob/21be189628d064931edc5f166ee00f61c8fb1412/benchmark/profile_restful_api.py#L67

Do I still need to set top_k = 1?

Also, as split-kv is taking effect automatically, variable batch size and sequence length at runtime may result in different split-kv factor. This will lead to differnt accumulation order and thus differnt outcome.

In terms of design and implementation, do we have to ensure that when temperature is 0, under batch inference, the results of the two requests are completely consistent?

zhyncs commented 1 month ago

TurboMind will support do_sample sampling parameter ref https://github.com/InternLM/lmdeploy/pull/1966