Closed wangjiyang closed 1 year ago
Check if this commit can improve user experience in streaming mode. Thanks
The speedup is minimal but it does make it slightly faster
Actually, the chunk sometimes makes it slower
gin.Stream streams it as it arrives while this sends things in chunks. Might be slightly better if it streams by line
Infact gin.Stream has a internal buffer, which caches buffer reading from remote and send it to client side. Below is my strace output. You can find that with gin.Stream, multiple reads and result in an actual write operation. This implementation increases throughput but actually involves in latency. However if we add flush operation, each read operation will result in an actual write, that definately send buffer to client immediately, and makes output smooth.
I attached my strace file in attachments for your referennce.
2192667 epoll_pwait(4, <unfinished ...>
2192665 epoll_pwait(4, <unfinished ...>
2192667 <... epoll_pwait resumed>[], 128, 0, NULL, 2) = 0
2192665 <... epoll_pwait resumed>[], 128, 0, NULL, 2) = 0
2192667 epoll_pwait(4, <unfinished ...>
2192665 futex(0xd04d08, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
2192666 <... nanosleep resumed>NULL) = 0
2192666 futex(0xd07d38, FUTEX_WAIT_PRIVATE, 0, {tv_sec=60, tv_nsec=0} <unfinished ...>
2192667 <... epoll_pwait resumed>[{events=EPOLLIN|EPOLLOUT, data={u32=2658360824, u64=139756599205368}}], 128, 346616, NULL, 2150252670990740) = 1
2192667 futex(0xd07d38, FUTEX_WAKE_PRIVATE, 1) = 1
2192666 <... futex resumed>) = 0
2192667 read(8, <unfinished ...>
2192666 nanosleep({tv_sec=0, tv_nsec=20000}, <unfinished ...>
2192667 <... read resumed>"\27\3\3\3\375\236:\275\267vQ\250\324\233\330\7\335\6\35MO\20\2\376\4v|p\304f>h\n\305T\256^xp\0c\224-3~\261l+\231[\251\306\25\331\203\583\17\226\307\202\245\343\214\330k\266\343\252\234-\363\211_^0$>\317\347\325Or\355\306~\270&\220\373O\316L\235\231:\214^\322I\303\353t\32\204\202\204\2006\363aM\333\241K\33\35\334\0167\23C^\205T\244/\356\252\211ZO\2565\347\341\360\367\16\345:\213#\353\351{\327\f\26\23\356\264\257\3319.\247\25\362\t\0300p\226\256\243\25m~\2157B\357S\235\1v\37\2\\\274\365\35\36o5\341\4Y\242\351\2#\5V$Ej\220,\262\177rM\335\323N\253\231\334\24\236\357\355\225\205#\207\252\10\272\233\370\31\357^[\nt\316\225s\324\364GUd\10\327\217\226e{\237\7`\200,[\211\342\353\345D\""..., 3156) = 1026
2192667 futex(0xd04d08, FUTEX_WAKE_PRIVATE, 1) = 1
2192666 <... nanosleep resumed>NULL) = 0
2192665 <... futex resumed>) = 0
2192667 read(8, <unfinished ...>
2192665 epoll_pwait(4, <unfinished ...>
2192667 <... read resumed>0xc000428000, 3156) = -1 EAGAIN (Resource temporarily unavailable)
2192665 <... epoll_pwait resumed>[], 128, 0, NULL, 0) = 0
2192666 nanosleep({tv_sec=0, tv_nsec=20000}, <unfinished ...>
2192665 epoll_pwait(4, <unfinished ...>
2192667 epoll_pwait(4, [], 128, 0, NULL, 2) = 0
2192667 futex(0xc000022d48, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
2192666 <... nanosleep resumed>NULL) = 0
2192666 futex(0xd07d38, FUTEX_WAIT_PRIVATE, 0, {tv_sec=60, tv_nsec=0} <unfinished ...>
2192665 <... epoll_pwait resumed>[{events=EPOLLIN|EPOLLOUT, data={u32=2658360824, u64=139756599205368}}], 128, 346540, NULL, 2150252670990740) = 1
2192665 futex(0xd07d38, FUTEX_WAKE_PRIVATE, 1) = 1
2192666 <... futex resumed>) = 0
2192665 read(8, "\27\3\3\4\3\20\372$+Lu@\21a\333\3623\376\354H\370u-\353W\373e\177\226!|R\372\263H\234\371\301\7.j'U|8\362s\224\262L'\315\300q\236\363,X\36\31\221\260#ib\203\33[(\214\203\322D\335\313\320x\3060\3763\6\311\254\20pw\240T\375l\323(lh\367\27mT\373\334\263#\0\270\341e*\26{(\247\211\2102j\27f%G\265\360Rw\327\264'KG\r&%\177\21\10q#\303\352ZNy\354\346<\37\311\246\266|\330\31\t\267%m\313\2143bi'\33\365}\27X\272'>6b\203\16\375\203\230\\\366\3173\256W\351\311\"\23\366N\242\235\200\331\273\312\276\0052\231\"\327!\345\7\253\202>$\311fq\304\352\344\n\217\335\260\355\2065\367DL\0\360\342\252\36\33cC\276\373G\352Y\357\356\353\353\226\16%5\250\371\32\254\373\334\321\232t\v\276"..., 3156) = 1032
2192666 nanosleep({tv_sec=0, tv_nsec=20000}, <unfinished ...>
2192665 futex(0xc000022d48, FUTEX_WAKE_PRIVATE, 1) = 1
2192667 <... futex resumed>) = 0
2192667 epoll_pwait(4, <unfinished ...>
2192665 read(8, <unfinished ...>
2192667 <... epoll_pwait resumed>[], 128, 0, NULL, 0) = 0
2192665 <... read resumed>0xc000428000, 3156) = -1 EAGAIN (Resource temporarily unavailable)
2192667 epoll_pwait(4, <unfinished ...>
2192666 <... nanosleep resumed>NULL) = 0
2192665 epoll_pwait(4, <unfinished ...>
2192667 <... epoll_pwait resumed>[], 128, 0, NULL, 2) = 0
2192665 <... epoll_pwait resumed>[], 128, 0, NULL, 2) = 0
2192667 epoll_pwait(4, <unfinished ...>
2192666 nanosleep({tv_sec=0, tv_nsec=20000}, <unfinished ...>
2192665 futex(0xd04d08, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
2192666 <... nanosleep resumed>NULL) = 0
2192666 futex(0xd07d38, FUTEX_WAIT_PRIVATE, 0, {tv_sec=60, tv_nsec=0} <unfinished ...>
2192667 <... epoll_pwait resumed>[{events=EPOLLIN|EPOLLOUT, data={u32=2658360824, u64=139756599205368}}], 128, 346461, NULL, 2150252670990740) = 1
2192667 futex(0xd07d38, FUTEX_WAKE_PRIVATE, 1) = 1
2192667 read(8, <unfinished ...>
2192666 <... futex resumed>) = 0
2192667 <... read resumed>"\27\3\3\4\tf\312\262\333H4\5\250\323\35\223\261tCGg\363\377\244\342\223o\364\336f\263D\254>.r3rN\311\324\3249\304\305\372\\\276\316q_\325\3 Q\347\254\3\4\370I\7\271b\341\333\37J\226\30\355\250T\302b\226\323\354\227%\252\333\240\272N\336\347\215I\362?\336\214\261\331\17v\215X\244\r\23\261\275?a\227\26\"\356\205\36\374XV\222\240a\237\210\2\211\345JS\320ha/U\360\1|\222\232h\371\243\352I\313_\230g-\363\212\267\360\236\3\347\342,\301\2616\361\351\267\4\325\343\333\323\250\3478WL\377\370(\25\262\306\342\231l\251.\341-\331 \307\t!CAz\367\270\fP#8\367\202pA\355\251c\3+\302w0\0321\231\240\237\200\364\59\264\213O\220\324\326~\364;z\1\v]HR\351s7/\365\225\227\236r\177\224\245`\1u\371T(\252\366\253\226\241"..., 3156) = 1038
2192666 nanosleep({tv_sec=0, tv_nsec=20000}, <unfinished ...>
2192667 futex(0xd04d08, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
2192665 <... futex resumed>) = 0
2192667 <... futex resumed>) = 1
2192665 epoll_pwait(4, <unfinished ...>
2192667 read(8, <unfinished ...>
2192666 <... nanosleep resumed>NULL) = 0
2192665 <... epoll_pwait resumed>[], 128, 0, NULL, 0) = 0
2192666 nanosleep({tv_sec=0, tv_nsec=20000}, <unfinished ...>
2192667 <... read resumed>0xc000428000, 3156) = -1 EAGAIN (Resource temporarily unavailable)
2192665 write(7, "7218774a7e2\", \"error\": null}\n\ndata: {\"message\": {\"id\": \"aad863f4-bef6-4a7a-861d-64438d729511\", \"author\": {\"role\": \"assistant\", \"name\": null, \"metadata\": {}}, \"create_time\": 1683434967.675813, \"update_time\": null, \"content\": {\"content_type\": \"text\", \"parts\""..., 4096 <unfinished ...>
2192667 epoll_pwait(4, <unfinished ...>
2192665 epoll_pwait(4, <unfinished ...>
2192667 <... epoll_pwait resumed>[], 128, 0, NULL, 2) = 0
2192665 <... epoll_pwait resumed>[], 128, 0, NULL, 2) = 0
2192667 epoll_pwait(4, <unfinished ...>
2192665 futex(0xd04d08, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
2192666 <... nanosleep resumed>NULL) = 0
2192666 futex(0xd07d38, FUTEX_WAIT_PRIVATE, 0, {tv_sec=60, tv_nsec=0} <unfinished ...>
2192667 <... epoll_pwait resumed>[{events=EPOLLIN|EPOLLOUT, data={u32=2658360824, u64=139756599205368}}], 128, 346616, NULL, 2150252670990740) = 1
2192667 futex(0xd07d38, FUTEX_WAKE_PRIVATE, 1) = 1
2192666 <... futex resumed>) = 0
2192667 read(8, <unfinished ...>
2192666 nanosleep({tv_sec=0, tv_nsec=20000}, <unfinished ...>
2192667 <... read resumed>"\27\3\3\3\375\236:\275\267vQ\250\324\233\330\7\335\6\35MO\20\2\376\4v|p\304f>h\n\305T\256^xp\0c\224-3~\261l+\231[\251\306\25\331\203\583\17\226\307\202\245\343\214\330k\266\343\252\234-\363\211_^0$>\317\347\325Or\355\306~\270&\220\373O\316L\235\231:\214^\322I\303\353t\32\204\202\204\2006\363aM\333\241K\33\35\334\0167\23C^\205T\244/\356\252\211ZO\2565\347\341\360\367\16\345:\213#\353\351{\327\f\26\23\356\264\257\3319.\247\25\362\t\0300p\226\256\243\25m~\2157B\357S\235\1v\37\2\\\274\365\35\36o5\341\4Y\242\351\2#\5V$Ej\220,\262\177rM\335\323N\253\231\334\24\236\357\355\225\205#\207\252\10\272\233\370\31\357^[\nt\316\225s\324\364GUd\10\327\217\226e{\237\7`\200,[\211\342\353\345D\""..., 3156) = 1026
2192667 futex(0xd04d08, FUTEX_WAKE_PRIVATE, 1) = 1
2192666 <... nanosleep resumed>NULL) = 0
2192665 <... futex resumed>) = 0
2192667 read(8, <unfinished ...>
2192665 epoll_pwait(4, <unfinished ...>
2192667 <... read resumed>0xc000428000, 3156) = -1 EAGAIN (Resource temporarily unavailable)
2192665 <... epoll_pwait resumed>[], 128, 0, NULL, 0) = 0
2192666 nanosleep({tv_sec=0, tv_nsec=20000}, <unfinished ...>
2192665 epoll_pwait(4, <unfinished ...>
2192667 epoll_pwait(4, [], 128, 0, NULL, 2) = 0
2192667 futex(0xc000022d48, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
2192666 <... nanosleep resumed>NULL) = 0
2192666 futex(0xd07d38, FUTEX_WAIT_PRIVATE, 0, {tv_sec=60, tv_nsec=0} <unfinished ...>
2192665 <... epoll_pwait resumed>[{events=EPOLLIN|EPOLLOUT, data={u32=2658360824, u64=139756599205368}}], 128, 346540, NULL, 2150252670990740) = 1
2192665 futex(0xd07d38, FUTEX_WAKE_PRIVATE, 1) = 1
2192666 <... futex resumed>) = 0
2192665 read(8, "\27\3\3\4\3\20\372$+Lu@\21a\333\3623\376\354H\370u-\353W\373e\177\226!|R\372\263H\234\371\301\7.j'U|8\362s\224\262L'\315\300q\236\363,X\36\31\221\260#ib\203\33[(\214\203\322D\335\313\320x\3060\3763\6\311\254\20pw\240T\375l\323(lh\367\27mT\373\334\263#\0\270\341e*\26{(\247\211\2102j\27f%G\265\360Rw\327\264'KG\r&%\177\21\10q#\303\352ZNy\354\346<\37\311\246\266|\330\31\t\267%m\313\2143bi'\33\365}\27X\272'>6b\203\16\375\203\230\\\366\3173\256W\351\311\"\23\366N\242\235\200\331\273\312\276\0052\231\"\327!\345\7\253\202>$\311fq\304\352\344\n\217\335\260\355\2065\367DL\0\360\342\252\36\33cC\276\373G\352Y\357\356\353\353\226\16%5\250\371\32\254\373\334\321\232t\v\276"..., 3156) = 1032
2192666 nanosleep({tv_sec=0, tv_nsec=20000}, <unfinished ...>
2192665 futex(0xc000022d48, FUTEX_WAKE_PRIVATE, 1) = 1
2192667 <... futex resumed>) = 0
2192667 epoll_pwait(4, <unfinished ...>
2192665 read(8, <unfinished ...>
2192667 <... epoll_pwait resumed>[], 128, 0, NULL, 0) = 0
2192665 <... read resumed>0xc000428000, 3156) = -1 EAGAIN (Resource temporarily unavailable)
2192667 epoll_pwait(4, <unfinished ...>
2192666 <... nanosleep resumed>NULL) = 0
2192665 epoll_pwait(4, <unfinished ...>
2192667 <... epoll_pwait resumed>[], 128, 0, NULL, 2) = 0
2192665 <... epoll_pwait resumed>[], 128, 0, NULL, 2) = 0
2192667 epoll_pwait(4, <unfinished ...>
2192666 nanosleep({tv_sec=0, tv_nsec=20000}, <unfinished ...>
2192665 futex(0xd04d08, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
2192666 <... nanosleep resumed>NULL) = 0
2192666 futex(0xd07d38, FUTEX_WAIT_PRIVATE, 0, {tv_sec=60, tv_nsec=0} <unfinished ...>
2192667 <... epoll_pwait resumed>[{events=EPOLLIN|EPOLLOUT, data={u32=2658360824, u64=139756599205368}}], 128, 346461, NULL, 2150252670990740) = 1
2192667 futex(0xd07d38, FUTEX_WAKE_PRIVATE, 1) = 1
2192667 read(8, <unfinished ...>
2192666 <... futex resumed>) = 0
2192667 <... read resumed>"\27\3\3\4\tf\312\262\333H4\5\250\323\35\223\261tCGg\363\377\244\342\223o\364\336f\263D\254>.r3rN\311\324\3249\304\305\372\\\276\316q_\325\3 Q\347\254\3\4\370I\7\271b\341\333\37J\226\30\355\250T\302b\226\323\354\227%\252\333\240\272N\336\347\215I\362?\336\214\261\331\17v\215X\244\r\23\261\275?a\227\26\"\356\205\36\374XV\222\240a\237\210\2\211\345JS\320ha/U\360\1|\222\232h\371\243\352I\313_\230g-\363\212\267\360\236\3\347\342,\301\2616\361\351\267\4\325\343\333\323\250\3478WL\377\370(\25\262\306\342\231l\251.\341-\331 \307\t!CAz\367\270\fP#8\367\202pA\355\251c\3+\302w0\0321\231\240\237\200\364\59\264\213O\220\324\326~\364;z\1\v]HR\351s7/\365\225\227\236r\177\224\245`\1u\371T(\252\366\253\226\241"..., 3156) = 1038
2192666 nanosleep({tv_sec=0, tv_nsec=20000}, <unfinished ...>
2192667 futex(0xd04d08, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
2192665 <... futex resumed>) = 0
2192667 <... futex resumed>) = 1
2192665 epoll_pwait(4, <unfinished ...>
2192667 read(8, <unfinished ...>
2192666 <... nanosleep resumed>NULL) = 0
2192665 <... epoll_pwait resumed>[], 128, 0, NULL, 0) = 0
2192666 nanosleep({tv_sec=0, tv_nsec=20000}, <unfinished ...>
2192667 <... read resumed>0xc000428000, 3156) = -1 EAGAIN (Resource temporarily unavailable)
2192665 write(7, "7218774a7e2\", \"error\": null}\n\ndata: {\"message\": {\"id\": \"aad863f4-bef6-4a7a-861d-64438d729511\", \"author\": {\"role\": \"assistant\", \"name\": null, \"metadata\": {}}, \"create_time\": 1683434967.675813, \"update_time\": null, \"content\": {\"content_type\": \"text\", \"parts\""..., 4096 <unfinished ...>
2
Hmm ok. The difference is minimal since the latency feels to be only a few ms whereas the fetch from the API varies significantly which may be why my measurements were off. I'll merge again
thanks
Previous implementation uses gin.Stream to copy message to proxy client. This can introduce message sending cached in server side and latency happens. This causes proxy client receives a bunch of messages one time, makes proxy clients have worse experience than openai official chabot. Proxy clients become smoother after this commit.