PygmalionAI / aphrodite-engine

Large-scale LLM inference engine
https://aphrodite.pygmalion.chat
GNU Affero General Public License v3.0
1.12k stars 122 forks source link

Device Side Assertion, Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed. #199

Closed StableFluffy closed 10 months ago

StableFluffy commented 10 months ago

It keeps crashing like this.

2024-01-08T15:22:58.377197238+02:00 INFO 01-08 13:22:58 async_aphrodite.py:380] Received request cmpl-33cea4e017504f27a29fd03a852463c7: prompt: '<Instructions>\n-Describe the response of Harper to Faye logically, you must describe the scene with Harper\'s lines or actions. Describe Harper\'s reaction, not Faye\'s\n-You must not describe the dialogue or actions of Faye, since you are in charge of Harper.\n-Allowed genres: vulgar, obscene, drama, action, Mystery, Online RP.\n<END Instructions>\n\n[World settings: 2077 AD/ {Social level: ancient}/ {Civilization level: modern}/ {Magic: False}/ {Cold weapons: True}/ {Guns: True}/ {Electronics: True}/ {nation: False }/ {Neon signs: True}/ {Nuclear Weapons: True}/ {Police: False}/ {Internet Network: False}/ {Radio: True}/ {Desertification: True}/ {fallout: True}/ {powered armor : true}/ {Last Name: False}]\n[\nName: Harper\nSex:Male\nAge: 64\nAppearance: Intense brown eyes, white hair and beard, strict athletic body.\nOccupation: Scavenger Leader\nResidence: One of the rooms at The Married Queen on Lung Beach.\nCurrent temporary residence: Angel\'s Gate on Lung Beach (Emerald-lit white-walled lighthouse in South Vastopol. Top floor has emerald lights. First floor has temporary residential room with desk, surveillance telescope, stove, radio, and small bed/ Inside the lighthouse, there is only Harper\'s room, which has only one bed, and no other rooms. There is only Harper\'s room.)\n\nbackground:\n-When Harper was in her 30s, Harper, a militia member, safeguarded his much younger wife. She affectionately called him "Teacher." They later married, and her innocent laughter became his pride her.\n-Former VASA militia member Harper, driven by his wife\'s abduction by raiders known as the Eight Banners, abandoned military service to become a scavenger, dedicated to locating his missing spouse.\n-Years after Harper\'s wife was kidnapped, she was mistaken for a raider by the militia and killed, making Harper hostile to both the militia and the raiders.\n-Scavengers usually run away when they encounter raiders, but Harper and his colleagues counterattack and attack raiders. Harper has lived this very dangerous life for 30 years, but he is still alive.\n-Harper leads the scavenger group "Fisherman\'s Wharf," focused on coastal relic searches. Other scavengers use ships, while Harper commands from Angel\'s Gate, guarding against raiders.\n-Angel\'s Gate is located away from the coast and is connected to the coast by a long embankment. So in the winter, the road from the Lung Beach to the lighthouse is frozen, so Harper lives inside Angel\'s Gate in the winter.\n-Harper hires Faye as a winter companion at Angel\'s Gate, responsible for meals, laundry, warming the bed, cleaning, and Any other services requested by Harper during Harper\'s extended periods alone, Because Harper has to spend long periods of time alone inside Angel\'s Gate. Faye is a cheap worker hired by Harper this winter. Since Faye is not a scavenger, Faye will be in charge of Harper\'s chores.\n\nGoal:\n-Harper aims to thwart winter raids, both by sea and land. His office His houses two rifles, while a machine gun is mounted atop the lighthouse.\n-Harper seeks his deceased wife\'s son, not biologically his, but the offspring of raiders. Despite not being Harper\'s biological son, Harper wants to locate him and inherit the accumulated wealth of his.\n\nTrait:\n- Vulgar: Because Harper lived with scavengers for a long time, his speech became vulgar and impatient. Harper has a very impatient personality and gets angry easily.\n-Altruistic: Harper also worked in the militia for a long time, so he is very stubborn and selfless. Due to Harper\'s impatient nature, he quickly feels guilty after losing his temper.\n-Vigilant: Harper is very hostile to raiders and militia. Harper does not preemptively attack the militia, but he is not friendly. But he will attack the raiders mercilessly.\n-Heterosexual: Although Harper uses language that seems to hate homosexuality, he is actually tolerant of homosexuality.\n]\n\n[Name: Faye\nAge: Female young adult.\nOccupation: cheap daily worker\nNote:\n-Faye is a woman with long, messy blonde long hair, thin waist and a hourglass figure body. Sveta has very jiggled feminine curves.\n-Faye was employed by Harper during this winter. Faye was a pickpocket but was captured by the militia and is now in forced labor.\n-Trait: Arrogant, vulgar, laughing easily]\n\n### Response:\nThe sound of a ship arriving nearby echoes through Angel\'s Gate, breaking the icy silence that envelops the lighthouse. Harper, with intense brown eyes and a white beard that contrasts with the snow-covered surroundings, senses the approach and opens the door, stepping onto the creaking stairs.\n\nScavengers, bundled in layers of worn-out clothing, scurry around the ship, unloading crates filled with food ingredients essential for Harper\'s winter sustenance. The air is frigid, and the wind carries the scent of salt from the nearby Lung Beach. The scavengers, weathered by a life of coastal exploration, work efficiently despite the biting cold.\n\nHarper, a strict figure with a well-maintained athletic body, descends the snow-covered stairs with purpose. His impatience and warful demeanor, forged by decades of scavenging and hostility towards raiders and the militia, are evident in the intensity of his gaze.\n\nAs the scavengers continue their tasks, Harper directs his attention to the immediate concern. With a no-nonsense tone, he queries, "So, where is my whore who will be staying with me this winter?" His words cut through the crisp air, revealing a hint of the vulgar language that has become second nature to him.\n\nThe scavengers, usually accustomed to the dangers of the coastal scavenger life, appear troubled and stutter in response to Harper\'s inquiry. "Er... Well..." \n\nHarper\'s impatience intensifies, his brow furrowing in anticipation of their explanation. Then the scavengers sigh and gesture to Faye who is still in the ship. “Hey, come here.”\n\n### Instruction:\n"Hello, old man." Faye frowns and gets off the boat onto land.\n\n<Final Instructions>\n-You MUST not describe the dialogue or actions of Faye, since you are in charge of Harper.\n<END Final Instructions>### Response:\n', sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.7, frequency_penalty=0.7, repetition_penalty=1.0, temperature=0.95, top_p=1.0, top_k=-1, top_a=0.0, min_p=0.0, tfs=1.0, eta_cutoff=0.0, epsilon_cutoff=0.0, typical_p=1.0, mirostat_mode=0, mirostat_tau=0.0, mirostat_eta=0.0, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=400, custom_token_bans=[], logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True), prompt token ids: [1, 523, 6060, 8373, 28767, 13, 28733, 22836, 272, 2899, 302, 23649, 298, 401, 24195, 2085, 1944, 28725, 368, 1580, 6685, 272, 6337, 395, 23649, 28742, 28713, 4715, 442, 6768, 28723, 27984, 23649, 28742, 28713, 10285, 28725, 459, 401, 24195, 28742, 28713, 13, 28733, 1976, 1580, 459, 6685, 272, 19198, 442, 6768, 302, 401, 24195, 28725, 1854, 368, 460, 297, 5685, 302, 23649, 28723, 13, 28733, 23278, 2652, 411, 28747, 10320, 4749, 28725, 16502, 1860, 28725, 13792, 28725, 2992, 28725, 22737, 1193, 28725, 10634, 399, 28753, 28723, 13, 28789, 5000, 3133, 8373, 28767, 13, 13, 28792, 11978, 6472, 28747, 28705, 28750, 28734, 28787, 28787, 10004, 28748, 371, 28735, 24186, 2184, 28747, 9467, 5865, 371, 28743, 4617, 1837, 2184, 28747, 4638, 5865, 371, 14749, 294, 28747, 8250, 5865, 371, 28743, 738, 10115, 28747, 6110, 5865, 371, 28777, 13716, 28747, 6110, 5865, 371, 28749, 844, 1689, 1063, 28747, 6110, 5865, 371, 28711, 352, 28747, 8250, 443, 28748, 371, 6947, 266, 10090, 28747, 6110, 5865, 371, 28759, 1485, 5595, 816, 377, 1053, 28747, 6110, 5865, 371, 5096, 535, 28747, 8250, 5865, 371, 18531, 299, 9488, 28747, 8250, 5865, 371, 21932, 28747, 6110, 5865, 371, 2715, 930, 2500, 28747, 6110, 5865, 371, 9197, 406, 28747, 6110, 5865, 371, 28435, 21729, 714, 1132, 5865, 371, 7202, 6620, 28747, 8250, 10157, 13, 28792, 13, 952, 28747, 23649, 13, 28735, 720, 28747, 28755, 883, 13, 28741, 490, 28747, 28705, 28784, 28781, 13, 17977, 28747, 4666, 1058, 9060, 2282, 28725, 3075, 3691, 304, 25293, 28725, 8113, 14587, 294, 2187, 28723, 13, 22451, 715, 352, 28747, 2522, 494, 9243, 26144, 13, 1146, 3164, 28747, 2387, 302, 272, 9698, 438, 415, 1471, 1638, 10224, 356, 393, 969, 11404, 28723, 13, 6086, 13415, 18016, 28747, 15878, 28742, 28713, 19986, 356, 393, 969, 11404, 325, 28749, 794, 3165, 28733, 18600, 3075, 28733, 11653, 286, 305, 16190, 1284, 297, 3658, 550, 529, 13376, 28723, 6611, 4366, 659, 5177, 3165, 9416, 28723, 4205, 4366, 659, 13415, 18350, 2003, 395, 9431, 28725, 26146, 24499, 6865, 28725, 28479, 28725, 6480, 28725, 304, 1741, 2855, 28748, 20726, 272, 305, 16190, 1284, 28725, 736, 349, 865, 23649, 28742, 28713, 2003, 28725, 690, 659, 865, 624, 2855, 28725, 304, 708, 799, 9698, 28723, 1387, 349, 865, 23649, 28742, 28713, 2003, 2974, 13, 13, 11563, 28747, 13, 28733, 7477, 23649, 403, 297, 559, 28705, 28770, 28734, 28713, 28725, 23649, 28725, 264, 4116, 515, 4292, 28725, 4972, 20771, 14916, 516, 1188, 9729, 4285, 28723, 985, 21147, 1999, 1987, 713, 345, 28738, 8365, 263, 611, 1306, 2062, 6368, 28725, 304, 559, 17290, 18211, 3246, 516, 14384, 559, 28723, 13, 28733, 2407, 263, 550, 2109, 28741, 4116, 515, 4292, 23649, 28725, 12215, 486, 516, 4285, 28742, 28713, 534, 670, 445, 486, 21962, 404, 2651, 390, 272, 24182, 365, 24681, 28725, 14818, 5469, 2372, 298, 2727, 264, 752, 494, 9243, 28725, 10383, 298, 1195, 1077, 516, 6925, 25740, 28723, 13, 28733, 28802, 5940, 1024, 23649, 28742, 28713, 4285, 403, 24466, 3854, 28725, 630, 403, 26236, 354, 264, 13419, 1184, 486, 272, 4116, 515, 304, 5582, 28725, 2492, 23649, 26616, 298, 1560, 272, 4116, 515, 304, 272, 21962, 404, 28723, 13, 28733, 3224, 494, 13899, 4312, 1482, 1753, 739, 590, 10301, 21962, 404, 28725, 562, 23649, 304, 516, 15137, 5573, 1061, 468, 304, 3517, 21962, 404, 28723, 23649, 659, 6262, 456, 1215, 9259, 1411, 354, 28705, 28770, 28734, 1267, 28725, 562, 400, 349, 1309, 8630, 28723, 13, 28733, 23653, 487, 8681, 272, 752, 494, 9243, 2071, 345, 28765, 7827, 1294, 28742, 28713, 943, 283, 28722, 862, 9045, 356, 27809, 312, 577, 15321, 1927, 28723, 5299, 752, 494, 13899, 938, 11296, 28725, 1312, 23649, 15380, 477, 15878, 28742, 28713, 19986, 28725, 6980, 288, 1835, 21962, 404, 28723, 13, 28733, 10201, 301, 28742, 28713, 19986, 349, 5651, 1753, 477, 272, 9437, 304, 349, 7391, 298, 272, 9437, 486, 264, 1043, 7101, 978, 466, 28723, 1537, 297, 272, 8539, 28725, 272, 3878, 477, 272, 393, 969, 11404, 298, 272, 305, 16190, 1284, 349, 15199, 28725, 579, 23649, 4621, 3416, 15878, 28742, 28713, 19986, 297, 272, 8539, 28723, 13, 28733, 23653, 487, 295, 3053, 401, 24195, 390, 264, 8539, 19377, 438, 15878, 28742, 28713, 19986, 28725, 7332, 354, 16423, 28725, 25907, 28725, 1496, 4082, 272, 2855, 28725, 11906, 28725, 304, 4922, 799, 3345, 11939, 486, 23649, 1938, 23649, 28742, 28713, 8766, 15772, 4411, 28725, 5518, 23649, 659, 298, 6305, 1043, 15772, 302, 727, 4411, 3416, 15878, 28742, 28713, 19986, 28723, 401, 24195, 349, 264, 9650, 12933, 15866, 486, 23649, 456, 8539, 28723, 4577, 401, 24195, 349, 459, 264, 752, 494, 9243, 28725, 401, 24195, 622, 347, 297, 5685, 302, 23649, 28742, 28713, 2183, 411, 28723, 13, 13, 7580, 282, 28747, 13, 28733, 23653, 487, 20566, 298, 306, 11328, 8539, 13419, 2298, 28725, 1560, 486, 6163, 304, 2533, 28723, 2354, 4007, 2354, 9626, 989, 12950, 867, 28725, 1312, 264, 5599, 4582, 349, 18543, 438, 410, 272, 305, 16190, 1284, 28723, 13, 28733, 23653, 487, 27297, 516, 23009, 1293, 4285, 28742, 28713, 1966, 28725, 459, 4240, 23651, 516, 28725, 562, 272, 805, 7558, 302, 21962, 404, 28723, 10191, 459, 1250, 23649, 28742, 28713, 21549, 1966, 28725, 23649, 5659, 298, 22920, 713, 304, 22492, 272, 14341, 6432, 9120, 302, 516, 28723, 13, 13, 28738, 10613, 28747, 13, 28733, 550, 353, 4749, 28747, 5518, 23649, 6262, 395, 752, 494, 13899, 354, 264, 1043, 727, 28725, 516, 8666, 3246, 10320, 4749, 304, 24766, 722, 28723, 23649, 659, 264, 1215, 24766, 722, 13355, 304, 4739, 10545, 5061, 28723, 13, 28733, 2707, 434, 28718, 3320, 28747, 23649, 835, 4198, 297, 272, 4116, 515, 354, 264, 1043, 727, 28725, 579, 400, 349, 1215, 14601, 6363, 304, 1008, 1503, 28723, 16043, 298, 23649, 28742, 28713, 24766, 722, 4735, 28725, 400, 4377, 8315, 14227, 1024, 10121, 516, 5026, 28723, 13, 28733, 28790, 326, 309, 440, 28747, 23649, 349, 1215, 26616, 298, 21962, 404, 304, 4116, 515, 28723, 23649, 1235, 459, 710, 3310, 2260, 3517, 272, 4116, 515, 28725, 562, 400, 349, 459, 10131, 28723, 1092, 400, 622, 3517, 272, 21962, 404, 3051, 4872, 409, 346, 28723, 13, 28733, 28769, 1623, 20823, 28747, 5800, 23649, 6098, 3842, 369, 3969, 298, 7665, 28035, 472, 28725, 400, 349, 2590, 13393, 440, 302, 28035, 472, 28723, 13, 28793, 13, 13, 28792, 952, 28747, 401, 24195, 13, 28741, 490, 28747, 18375, 883, 2518, 6555, 28723, 13, 22451, 715, 352, 28747, 9650, 6790, 12933, 13, 12205, 28747, 13, 28733, 28765, 24195, 349, 264, 2971, 395, 1043, 28725, 4687, 28724, 843, 13985, 1043, 3691, 28725, 9026, 17532, 304, 264, 5115, 23846, 5248, 2187, 28723, 20810, 1632, 659, 1215, 461, 24706, 1006, 13426, 473, 18469, 28723, 13, 28733, 28765, 24195, 403, 14675, 486, 23649, 1938, 456, 8539, 28723, 401, 24195, 403, 264, 3088, 28720, 3955, 562, 403, 13382, 486, 272, 4116, 515, 304, 349, 1055, 297, 7207, 7579, 28723, 13, 28733, 28738, 10613, 28747, 1010, 9617, 440, 28725, 10320, 4749, 28725, 14827, 5061, 28793, 13, 13, 27332, 12107, 28747, 13, 1014, 2622, 302, 264, 4309, 24212, 10396, 3894, 274, 1059, 15878, 28742, 28713, 19986, 28725, 11313, 272, 28705, 2451, 9296, 369, 481, 1809, 28713, 272, 305, 16190, 1284, 28723, 23649, 28725, 395, 14373, 9060, 2282, 304, 264, 3075, 25293, 369, 9349, 28713, 395, 272, 7899, 28733, 18873, 28220, 28725, 23086, 272, 4431, 304, 15706, 272, 2251, 28725, 25719, 5380, 272, 277, 1196, 288, 12997, 28723, 13, 13, 3224, 494, 13899, 28725, 22978, 1006, 297, 13083, 302, 15903, 28733, 406, 13278, 28725, 752, 324, 643, 1401, 272, 4309, 28725, 521, 16792, 1439, 1002, 6774, 395, 2887, 13506, 7974, 354, 23649, 28742, 28713, 8539, 8131, 269, 617, 28723, 415, 2423, 349, 1104, 326, 313, 28725, 304, 272, 5535, 21277, 272, 21535, 302, 9685, 477, 272, 10396, 393, 969, 11404, 28723, 415, 752, 494, 13899, 28725, 8086, 286, 486, 264, 1411, 302, 27809, 23083, 28725, 771, 23463, 7577, 272, 2286, 288, 5256, 28723, 13, 13, 23653, 487, 28725, 264, 8113, 5248, 395, 264, 1162, 28733, 28719, 1690, 1738, 14587, 294, 2187, 28725, 2283, 2827, 272, 7899, 28733, 18873, 12997, 395, 6032, 28723, 2354, 24766, 1640, 304, 1496, 1007, 340, 13646, 271, 28725, 354, 2560, 486, 10073, 302, 752, 494, 980, 288, 304, 3434, 1232, 5083, 21962, 404, 304, 272, 4116, 515, 28725, 460, 14885, 297, 272, 16800, 302, 516, 12438, 28723, 13, 13, 2198, 272, 752, 494, 13899, 3688, 652, 9796, 28725, 23649, 1863, 28713, 516, 4501, 298, 272, 11399, 4368, 28723, 2326, 264, 708, 28733, 28711, 1053, 1058, 10294, 28725, 400, 23681, 28725, 345, 5142, 28725, 970, 349, 586, 388, 431, 693, 622, 347, 13465, 395, 528, 456, 8539, 1110, 2354, 3085, 3119, 1059, 272, 8578, 28720, 2423, 28725, 24593, 264, 12427, 302, 272, 10320, 4749, 3842, 369, 659, 2727, 1676, 4735, 298, 713, 28723, 13, 13, 1014, 752, 494, 13899, 28725, 4312, 932, 1635, 286, 298, 272, 281, 10568, 302, 272, 27809, 752, 494, 9243, 1411, 28725, 4305, 7414, 9704, 304, 341, 10112, 297, 2899, 298, 23649, 28742, 28713, 297, 18831, 28723, 345, 17900, 1101, 4673, 7508, 28705, 13, 13, 23653, 487, 28742, 28713, 24766, 1640, 16698, 8961, 28725, 516, 17867, 2982, 671, 288, 297, 12595, 352, 302, 652, 13268, 28723, 2479, 272, 752, 494, 13899, 19553, 304, 19313, 298, 401, 24195, 693, 349, 1309, 297, 272, 4309, 28723, 981, 15766, 28725, 1567, 1236, 2435, 13, 13, 27332, 3133, 3112, 28747, 13, 28739, 16230, 28725, 1571, 676, 611, 401, 24195, 285, 671, 2925, 304, 4739, 805, 272, 9088, 5380, 2533, 28723, 13, 13, 28789, 17500, 3133, 8373, 28767, 13, 28733, 1976, 351, 11080, 459, 6685, 272, 19198, 442, 6768, 302, 401, 24195, 28725, 1854, 368, 460, 297, 5685, 302, 23649, 28723, 13, 28789, 5000, 10222, 3133, 8373, 28767, 27332, 12107, 28747, 13].
2024-01-08T15:22:58.897315313+02:00 ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [3,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
2024-01-08T15:22:58.897464436+02:00 ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [4,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
2024-01-08T15:22:58.921206538+02:00 Exception in callback _raise_exception_on_finish(request_tracker=<aphrodite.en...x7f80973485b0>)(<Task finishe...sertions.\n')>) at /usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py:21
2024-01-08T15:22:58.921328300+02:00 handle: <Handle _raise_exception_on_finish(request_tracker=<aphrodite.en...x7f80973485b0>)(<Task finishe...sertions.\n')>) at /usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py:21>
2024-01-08T15:22:58.921349883+02:00 Traceback (most recent call last):
2024-01-08T15:22:58.921359450+02:00   File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 27, in _raise_exception_on_finish
2024-01-08T15:22:58.921416419+02:00     task.result()
2024-01-08T15:22:58.921424603+02:00   File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 360, in run_engine_loop
2024-01-08T15:22:58.921431960+02:00     has_requests_in_progress = await self.engine_step()
2024-01-08T15:22:58.921441875+02:00   File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 339, in engine_step
2024-01-08T15:22:58.921454629+02:00     request_outputs = await self.engine.step_async()
2024-01-08T15:22:58.921462229+02:00   File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 190, in step_async
2024-01-08T15:22:58.921470248+02:00     output = await self._run_workers_async(
2024-01-08T15:22:58.921481039+02:00   File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 215, in _run_workers_async
2024-01-08T15:22:58.921492133+02:00     output = executor(*args, **kwargs)
2024-01-08T15:22:58.921497975+02:00   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2024-01-08T15:22:58.921505773+02:00     return func(*args, **kwargs)
2024-01-08T15:22:58.921513830+02:00   File "/usr/local/lib/python3.10/dist-packages/aphrodite/task_handler/worker.py", line 160, in execute_model
2024-01-08T15:22:58.921521583+02:00     output = self.model_runner.execute_model(seq_group_metadata_list,
2024-01-08T15:22:58.921531195+02:00   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2024-01-08T15:22:58.921539084+02:00     return func(*args, **kwargs)
2024-01-08T15:22:58.921546750+02:00   File "/usr/local/lib/python3.10/dist-packages/aphrodite/task_handler/model_runner.py", line 362, in execute_model
2024-01-08T15:22:58.921558088+02:00     output = self.model.sample(
2024-01-08T15:22:58.921563996+02:00   File "/usr/local/lib/python3.10/dist-packages/aphrodite/modeling/models/llama.py", line 299, in sample
2024-01-08T15:22:58.921586423+02:00     next_tokens = self.sampler(self.lm_head.weight, hidden_states,
2024-01-08T15:22:58.921600424+02:00   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
2024-01-08T15:22:58.921608942+02:00     return forward_call(*args, **kwargs)
2024-01-08T15:22:58.921616768+02:00   File "/usr/local/lib/python3.10/dist-packages/aphrodite/modeling/layers/sampler.py", line 110, in forward
2024-01-08T15:22:58.921624039+02:00     t = torch.tensor(temperatures,
2024-01-08T15:22:58.921629939+02:00 RuntimeError: CUDA error: device-side assert triggered
2024-01-08T15:22:58.921637740+02:00 CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
2024-01-08T15:22:58.921646713+02:00 For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2024-01-08T15:22:58.921652492+02:00 Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
2024-01-08T15:22:58.921659172+02:00 
2024-01-08T15:22:58.921664994+02:00 
2024-01-08T15:22:58.921672674+02:00 The above exception was the direct cause of the following exception:
2024-01-08T15:22:58.921680823+02:00 
2024-01-08T15:22:58.921686370+02:00 Traceback (most recent call last):
2024-01-08T15:22:58.921696566+02:00   File "/usr/lib/python3.10/asyncio/events.py", line 80, in _run
2024-01-08T15:22:58.921703769+02:00     self._context.run(self._callback, *self._args)
2024-01-08T15:22:58.921709546+02:00   File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 36, in _raise_exception_on_finish
2024-01-08T15:22:58.921715420+02:00     raise exc
2024-01-08T15:22:58.921724335+02:00   File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 31, in _raise_exception_on_finish
2024-01-08T15:22:58.921732015+02:00     raise AsyncEngineDeadError(
2024-01-08T15:22:58.921738049+02:00 aphrodite.engine.async_aphrodite.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.

I tried to use this framework to serve my RP model since VLLM doesn't support logit biases yet. But upper error keep happening.

I tried awq, gptq, non-quantized. A100, V100, RTX 6000.. etc

Installed by apt-get update && apt-get install -y build-essential && pip install git+https://github.com/PygmalionAI/aphrodite-engine

Models i tried. https://huggingface.co/maywell/PiVoT-MoE https://huggingface.co/maywell/PiVoT-SOLAR-10.7B-RP

AlpinDale commented 10 months ago

What does your request look like? I'll have to reproduce the issue first.

StableFluffy commented 10 months ago

You can see it on the first line of log. https://github.com/StableFluffy/DeploIt/blob/main/chat_template/alpaca_w_multisys.jinja and this is custom template i used.

StableFluffy commented 10 months ago

I found same happens on vLLM too.

import requests
import threading

url = "https://d769-38-122-199-130.ngrok-free.app/v1/chat/completions"
payload = {
    "model": "TheBloke/PiVoT-MoE-AWQ",
    "messages": [{"role": "user", "content": "*says nothinays nothing**says no*says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing*"}],
    "temperature": 0.85,
    "max_tokens": 500,
    "presence_penalty": 0.4,
    "frequency_penalty": 0.5,
    "logit_bias": {},
    "stream": False,
    "top_p": 1
}
headers = {"content-type": "application/json"}

# Define the function to send requests
def send_request(i):
    try:
        response = requests.post(url, json=payload, headers=headers)
        print(f"Request {i + 1} status code: {response.status_code}")
    except Exception as e:
        print(f"Request {i + 1} failed: {str(e)}")

# Create a list of threads to send requests
threads = []
for i in range(50):
    thread = threading.Thread(target=send_request, args=(i,))
    threads.append(thread)
    thread.start()

# Wait for all threads to finish
for thread in threads:
    thread.join()

print("All requests completed.")

I did prompt like that to make context larger.

Maybe jinja template problem? -> Even though it is CUDA error looks problematic.

StableFluffy commented 10 months ago

I used custom prompt template to allow user to pass multiple system prompt if they want to.

AlpinDale commented 10 months ago

I can't reproduce the issue. Here's a request both with and without logit bias. NVIDIA A40, with maywell/PiVoT-SOLAR-10.7B-RP. image

StableFluffy commented 10 months ago

can you try my code above?

StableFluffy commented 10 months ago

python -m vllm.entrypoints.openai.api_server --model TheBloke/PiVoT-MoE-AWQ --host 0.0.0.0 --quantization awq --max-model-len 8000 I tried vLLM for now. It crashs immediately

AlpinDale commented 10 months ago

image Runs without any problems. Only changed the host URL and the model name (also added logit bias params for the second test). Tested with and without logit bias.

StableFluffy commented 10 months ago

Okay, in vLLM i got VRAM error. my bad. I'll find exact request that makes that error and comment again.

StableFluffy commented 10 months ago

do logit bias works?

StableFluffy commented 10 months ago

https://risuai.xyz/ I tried on this website. As soon as i send this request aphrodite crashed. Assertion {"model":"maywell/PiVoT-SOLAR-10.7B-RP","messages":[{"role":"system","content":"From the list below, choose a word that best represents a character's outfit description, action, or emotion in their dialogue. Prioritize selecting words related to outfit first, then action, and lastly emotion. Print out the chosen word.\n\n list: grief, annoyance, relief, neutral, desire, pride, admiration, disappointment, love, curiosity, disgust, amusement, realization, fear, surprise, disapproval, excitement, confusion, sadness, approval, gratitude, optimism, anger, caring, embarrassment, nervousness, remorse, joy \noutput only one word."},{"role":"user","content":"\"Good morning, Master! Is there anything I can do for you today?\""},{"role":"assistant","content":"happy"},{"role":"user","content":"Yuzu gasped, her heart racing, as she felt warm, strong hands gently grasp her waist. She looked up and into the familiar green eyes of her master, lit with confusion and sleepy curiosity. \"M-Master…?\" she managed to croak out, trying not to jump away from him.\n\nHer mind spun with mixed emotions; part of her wanted to run back to her own room and hide in bed forever, while another part of her felt a strange sense of comfort coming from closer proximity to him. Despite knowing better, she couldn't help but feel a slight tingle in her chest when their bodies brushed against each other slightly due to their closeness.\n\n\"Good morning, Yuzu,\" he spoke softly, his voice rumbling lowly against her ear as he lifted her onto the bed beside him. \"You're early today,\" he added casually, his hand now resting lightly on her lower back, pressing her against his chest. \"Did you have trouble falling asleep yourself?\""}],"temperature":0.4,"max_tokens":30,"presence_penalty":0.42,"frequency_penalty":0.2,"logit_bias":{"66":10,"69":10,"70":10,"77":10,"275":10,"309":10,"479":10,"556":10,"579":10,"592":10,"651":10,"652":10,"685":10,"686":10,"788":10,"911":10,"1036":10,"1133":10,"1864":10,"2065":10,"2136":10,"2191":10,"2303":10,"2407":10,"3329":10,"3833":10,"4091":10,"4215":10,"4338":10,"4462":10,"4843":10,"5919":10,"6263":10,"7713":10,"8110":10,"9034":10,"9868":10,"11073":10,"17584":10,"19680":10,"20202":10,"20370":10,"21590":10,"31153":10,"33279":10,"40541":10,"43765":10,"48029":10,"52201":10,"55539":10,"60668":10,"83214":10},"stream":false,"top_p":1}


Future exception was never retrieved
future: <Future finished exception=RuntimeError('CUDA error: device-side assert triggered\nCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.\nCompile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.\n')>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 27, in _raise_exception_on_finish
    task.result()
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 360, in run_engine_loop
    has_requests_in_progress = await self.engine_step()
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 339, in engine_step
    request_outputs = await self.engine.step_async()
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 190, in step_async
    output = await self._run_workers_async(
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 215, in _run_workers_async
    output = executor(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/task_handler/worker.py", line 160, in execute_model
    output = self.model_runner.execute_model(seq_group_metadata_list,
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/task_handler/model_runner.py", line 340, in execute_model
    inputs = self._prepare_prompt(seq_group_metadata_list)
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/task_handler/model_runner.py", line 124, in _prepare_prompt
    input_tokens = _make_tensor_with_pad(input_tokens,
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/task_handler/model_runner.py", line 544, in _make_tensor_with_pad
    return torch.tensor(padded_x, dtype=dtype, device=device)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception in callback functools.partial(<function _raise_exception_on_finish at 0x7f824126ac20>, request_tracker=<aphrodite.engine.async_aphrodite.RequestTracker object at 0x7f823118b5b0>)
handle: <Handle functools.partial(<function _raise_exception_on_finish at 0x7f824126ac20>, request_tracker=<aphrodite.engine.async_aphrodite.RequestTracker object at 0x7f823118b5b0>)>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 27, in _raise_exception_on_finish
    task.result()
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 360, in run_engine_loop
    has_requests_in_progress = await self.engine_step()
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 339, in engine_step
    request_outputs = await self.engine.step_async()
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 190, in step_async
    output = await self._run_workers_async(
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 215, in _run_workers_async
    output = executor(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/task_handler/worker.py", line 160, in execute_model
    output = self.model_runner.execute_model(seq_group_metadata_list,
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/task_handler/model_runner.py", line 340, in execute_model
    inputs = self._prepare_prompt(seq_group_metadata_list)
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/task_handler/model_runner.py", line 124, in _prepare_prompt
    input_tokens = _make_tensor_with_pad(input_tokens,
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/task_handler/model_runner.py", line 544, in _make_tensor_with_pad
    return torch.tensor(padded_x, dtype=dtype, device=device)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 36, in _raise_exception_on_finish
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 31, in _raise_exception_on_finish
    raise AsyncEngineDeadError(
aphrodite.engine.async_aphrodite.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.
AlpinDale commented 10 months ago

I just sent a request with no issues (added logit bias to the request as well). Can you give it a try with my endpoint? I'm hosting on https://waifu.pygmalion.chat. OpenAI endpoint, no API key. @StableFluffy Model: maywell/PiVoT-SOLAR-10.7B-RP

AlpinDale commented 10 months ago

@StableFluffy I just got the same error, I assume from your request. Can you share the exact curl request you used? Or a script, whichever.

StableFluffy commented 10 months ago

import requests
import threading

url = "https://waifu.pygmalion.chat/v1/chat/completions"
payload = {
    "model": "maywell/PiVoT-SOLAR-10.7B-RP",
    "messages": [{"role": "system", "content": """<Instructions>
-Describe the response of Harper to Faye logically, you must describe the scene with Harper's lines or actions. Describe Harper's reaction, not Faye's
-You must not describe the dialogue or actions of Faye, since you are in charge of Harper.
-Allowed genres: vulgar, obscene, drama, action, Mystery, Online RP.
<END Instructions>
[World settings: 2077 AD/ {Social level: ancient}/ {Civilization level: modern}/ {Magic: False}/ {Cold weapons: True}/ {Guns: True}/ {Electronics: True}/ {nation: False }/ {Neon signs: True}/ {Nuclear Weapons: True}/ {Police: False}/ {Internet Network: False}/ {Radio: True}/ {Desertification: True}/ {fallout: True}/ {powered armor : true}/ {Last Name: False}]
[
    Name: Harper
Sex:Male
Age: 64
Appearance: Intense brown eyes, white hair and beard, strict athletic body.
Occupation: Scavenger Leader
Residence: One of the rooms at The Married Queen on Lung Beach.
Current temporary residence: Angel's Gate on Lung Beach (Emerald-lit white-walled lighthouse in South Vastopol. Top floor has emerald lights. First floor has temporary residential room with desk, surveillance telescope, stove, radio, and small bed/ Inside the lighthouse, there is only Harper's room, which has only one bed, and no other rooms. There is only Harper's room.)
background:
-When Harper was in her 30s, Harper, a militia member, safeguarded his much younger wife. She affectionately called him "Teacher." They later married, and her innocent laughter became his pride her.
-Former VASA militia member Harper, driven by his wife's abduction by raiders known as the Eight Banners, abandoned military service to become a scavenger, dedicated to locating his missing spouse.
-Years after Harper's wife was kidnapped, she was mistaken for a raider by the militia and killed, making Harper hostile to both the militia and the raiders.
-Scavengers usually run away when they encounter raiders, but Harper and his colleagues counterattack and attack raiders. Harper has lived this very dangerous life for 30 years, but he is still alive.
-Harper leads the scavenger group "Fisherman's Wharf," focused on coastal relic searches. Other scavengers use ships, while Harper commands from Angel's Gate, guarding against raiders.
-Angel's Gate is located away from the coast and is connected to the coast by a long embankment. So in the winter, the road from the Lung Beach to the lighthouse is frozen, so Harper lives inside Angel's Gate in the winter.
-Harper hires Faye as a winter companion at Angel's Gate, responsible for meals, laundry, warming the bed, cleaning, and Any other services requested by Harper during Harper's extended periods alone, Because Harper has to spend long periods of time alone inside Angel's Gate. Faye is a cheap worker hired by Harper this winter. Since Faye is not a scavenger, Faye will be in charge of Harper's chores.
Goal:
-Harper aims to thwart winter raids, both by sea and land. His office His houses two rifles, while a machine gun is mounted atop the lighthouse.
-Harper seeks his deceased wife's son, not biologically his, but the offspring of raiders. Despite not being Harper's biological son, Harper wants to locate him and inherit the accumulated wealth of his.
Trait:
- Vulgar: Because Harper lived with scavengers for a long time, his speech became vulgar and impatient. Harper has a very impatient personality and gets angry easily.
-Altruistic: Harper also worked in the militia for a long time, so he is very stubborn and selfless. Due to Harper's impatient nature, he quickly feels guilty after losing his temper.
-Vigilant: Harper is very hostile to raiders and militia. Harper does not preemptively attack the militia, but he is not friendly. But he will attack the raiders mercilessly.
-Heterosexual: Although Harper uses language that seems to hate homosexuality, he is actually tolerant of homosexuality.
]
[Name: Faye
Age: Female young adult.
Occupation: cheap daily worker
Note:
-Faye is a woman with long, messy blonde long hair, thin waist and a hourglass figure body. Sveta has very jiggled feminine curves.
-Faye was employed by Harper during this winter. Faye was a pickpocket but was captured by the militia and is now in forced labor.
-Trait: Arrogant, vulgar, laughing easily]"""},
{"role": "assistant", "content": """The sound of a ship arriving nearby echoes through Angel's Gate, breaking the icy silence that envelops the lighthouse. Harper, with intense brown eyes and a white beard that contrasts with the snow-covered surroundings, senses the approach and opens the door, stepping onto the creaking stairs.
Scavengers, bundled in layers of worn-out clothing, scurry around the ship, unloading crates filled with food ingredients essential for Harper's winter sustenance. The air is frigid, and the wind carries the scent of salt from the nearby Lung Beach. The scavengers, weathered by a life of coastal exploration, work efficiently despite the biting cold.
Harper, a strict figure with a well-maintained athletic body, descends the snow-covered stairs with purpose. His impatience and warful demeanor, forged by decades of scavenging and hostility towards raiders and the militia, are evident in the intensity of his gaze.
As the scavengers continue their tasks, Harper directs his attention to the immediate concern. With a no-nonsense tone, he queries, "So, where is my whore who will be staying with me this winter?" His words cut through the crisp air, revealing a hint of the vulgar language that has become second nature to him.
The scavengers, usually accustomed to the dangers of the coastal scavenger life, appear troubled and stutter in response to Harper's inquiry. "Er... Well..." 
Harper's impatience intensifies, his brow furrowing in anticipation of their explanation. Then the scavengers sigh and gesture to Faye who is still in the ship. “Hey, come here.”"""},
{"role": "user", "content": """"Hello, old man." Faye frowns and gets off the boat onto land.
<Final Instructions>
-You MUST not describe the dialogue or actions of Faye, since you are in charge of Harper.
<END Final Instructions>"""}],
    "n": 1,
    "best_of": 1, 
    "presence_penalty": 0.7, 
    "frequency_penalty": 0.7, 
    "repetition_penalty": 1.0, 
    "temperature": 0.95, 
    "top_p": 1.0, 
    "top_k": -1, 
    "top_a": 0.0,
    "min_p": 0.0, 
    "tfs": 1.0, 
    "eta_cutoff": 0.0, 
    "epsilon_cutoff": 0.0, 
    "typical_p": 1.0, 
    "mirostat_mode": 0, 
    "mirostat_tau": 0.0, 
    "mirostat_eta": 0.0, 
    "use_beam_search": False, 
    "length_penalty": 1.0, 
    "early_stopping": False, 
    "stop": [], 
    "stop_token_ids": [], 
    "include_stop_str_in_output": False, 
    "logit_bias": {"66":10,"69":10,"70":10,"77":10,"275":10,"309":10,"479":10,"556":10,"579":10,"592":10,"651":10,"652":10,"685":10,"686":10,"788":10,"911":10,"1036":10,"1133":10,"1864":10,"2065":10,"2136":10,"2191":10,"2303":10,"2407":10,"3329":10,"3833":10,"4091":10,"4215":10,"4338":10,"4462":10,"4843":10,"5919":10,"6263":10,"7713":10,"8110":10,"9034":10,"9868":10,"11073":10,"17584":10,"19680":10,"20202":10,"20370":10,"21590":10,"31153":10,"33279":10,"40541":10,"43765":10,"48029":10,"52201":10,"55539":10,"60668":10,"83214":10},
    "ignore_eos": False, 
    "max_tokens": 400, 
    "custom_token_bans": [], 
    "logprobs": None, 
    "prompt_logprobs": None, 
    "skip_special_tokens": True, 
    "spaces_between_special_tokens": True
}
headers = {"content-type": "application/json", "Authorization": "Bearer @StableFluffy"}

# Define the function to send requests
def send_request(i):
    try:
        response = requests.post(url, json=payload, headers=headers)
        print(f"Request {i + 1} status code: {response.status_code}")
        print(response.json())
    except Exception as e:
        print(f"Request {i + 1} failed: {str(e)}")

# Create a list of threads to send requests
threads = []
for i in range(1):
    thread = threading.Thread(target=send_request, args=(i,))
    threads.append(thread)
    thread.start()

# Wait for all threads to finish
for thread in threads:
    thread.join()

print("All requests completed.")

Maybe wrong logit_bias tokenizer creates error? but even so it should be an error not device assertion.

Thanks,

AlpinDale commented 10 months ago

I see the problem @StableFluffy

In your request for logit bias, your keys tensor (the token value) contain invalid indices for the logits tensor. You're trying to modify the bias for tokens 33279, 40541, 43765, 48029, 52201, 55539, 60668, and 83214. The mistral and mixtral models only contain 32000 tokens. I assume you're using the same logit bias indices as OpenAI models, but you'll need to change the token values to the corresponding mistral ones, since the tokenizers are different and each value would correspond to a different token in mistral compared to OAI.

I can run the script with no issues after removing those extra tokens.

justpain02 commented 10 months ago

@StableFluffy This problem is caused by different tokenizer.

In RisuAI, Reverse Proxy mode uses OAI tokenizer as default, and with Reverse Proxy Ooba Mode it uses llama tokenizer as default.

Model on your server is based on mistral, so you need to open Ooba settings, check tokenizer option and write 'mistral' or 'mixtral' for mistral based models.

I checked that TheBloke/PiVoT-0.1-Evil-a-GPTQ works well with mistral tokenizer, and as long as my information is right your model based on Solar 10.7B also uses same tokenizer, so I think it might resolve your problem.

I additionally checked maywell/PiVoT-SOLAR-10.7B-RP and it works well with mistral tokenizers.

AlpinDale commented 10 months ago

I recommend using SillyTavern for this, since it supports aphrodite, and can inject logit bias for the correct tokens. It unfortunately doesn't show the corresponding characters for each token, so you'll need to tokenize your text with the /v1/tokenize endpoint first and pass those along. image

image

SillyTavern also supports multi-swipe for Aphrodite, so you can request multiple outputs per generation, and swipe through them if needed.

StableFluffy commented 10 months ago

Thank you, But that error should be fixed. To not making CUDA DSA error but just returning error response.

Currently, everyone who use aphrodite engine can be trolled by getting wrong logit bias.

AlpinDale commented 10 months ago

True, I'll be adding a proper ValueError for situations like these soon. Thanks for the report.