eth-sri / lmql

A language for constraint-guided and efficient LLM programming.
https://lmql.ai
Apache License 2.0
3.64k stars 197 forks source link

is model.score_sync not deterministic ? #357

Open 4mbroise opened 3 months ago

4mbroise commented 3 months ago

I need a consistent prediction score on tokens generated, i'm facing issues with model.score(...) function. Same behavior with the model.score_sync(...) function.

Am i missing something ?

Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import lmql
>>> model = lmql.model("local:mistralai/Mistral-7B-v0.1")
>>> model.score_sync("Hello", ["World", "Apples", "Oranges"]).probs()
mistralai/Mistral-7B-v0.1
[Loading mistralai/Mistral-7B-v0.1 with AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")]]
Loading checkpoint shards: 100%|████████| 2/2 [00:29<00:00, 14.82s/it]
[mistralai/Mistral-7B-v0.1 ready on device cpu]
CompletedProcess(args=['pip', 'show', 'bitsandbytes'], returncode=0, stdout=b'Name: bitsandbytes\nVersion: 0.43.1\nSummary: k-bit optimizers and matrix multiplication routines.\nHome-page: https://github.com/TimDettmers/bitsandbytes\nAuthor: Tim Dettmers\nAuthor-email: dettmers@cs.washington.edu\nLicense: MIT\nLocation: /home2/efaugier/lqml/lib/python3.10/site-packages\nRequires: numpy, torch\nRequired-by: \n', stderr=b'')
array([9.99373272e-01, 3.40812384e-04, 2.85915910e-04])
>>> model.score_sync("Hello", ["World", "Apples", "Oranges"]).probs()
array([9.99373270e-01, 3.40813439e-04, 2.85917000e-04])
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import lmql
>>> model = lmql.model("local:mistralai/Mistral-7B-v0.1", cuda=True)
>>> model.score_sync("Hello", ["World", "Apples", "Oranges"]).probs()
mistralai/Mistral-7B-v0.1
[Loading mistralai/Mistral-7B-v0.1 with AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1", device_map=auto)]]
Loading checkpoint shards: 100%|████████| 2/2 [00:14<00:00,  7.41s/it]
[mistralai/Mistral-7B-v0.1 ready on device cuda:0]
CompletedProcess(args=['pip', 'show', 'bitsandbytes'], returncode=0, stdout=b'Name: bitsandbytes\nVersion: 0.43.1\nSummary: k-bit optimizers and matrix multiplication routines.\nHome-page: https://github.com/TimDettmers/bitsandbytes\nAuthor: Tim Dettmers\nAuthor-email: dettmers@cs.washington.edu\nLicense: MIT\nLocation: /home2/efaugier/lqml/lib/python3.10/site-packages\nRequires: numpy, torch\nRequired-by: \n', stderr=b'')
array([9.99373271e-01, 3.40813521e-04, 2.85915365e-04])
>>> model.score_sync("Hello", ["World", "Apples", "Oranges"]).probs()
array([9.99373272e-01, 3.40812384e-04, 2.85915637e-04])