OoriData / OgbujiPT

Client-side toolkit for using large language models, including where self-hosted
Apache License 2.0
103 stars 8 forks source link

Look into asyncio client approach; add scaffold if this is not possible from upstream libraries #4

Closed uogbuji closed 1 year ago

uogbuji commented 1 year ago

@ChocolateEinstein and I were looking into Python async invocation via Langchain to an Ooba back end today. Immediate problem as an attempt to use LC's llm.agenerate coroutine seemed to just be hanging when I tried to use it fully async.

I know there are tons of limitations and points of brittleness around LC's async capabilities, but honestly, I can't contemplate any other way of writing LLM-based code.LC supports async for OpenAI via Python's openai lib, but that relies on OpenAI's own capabilities (batch prompt processing, streaming, etc.) I had a hunch the emulation in Ooba's OpenAI extension wouldn't quite be up to all that, and a conversation with @matatonic seems to confirm this, though mata seems willing to put some work into addressing this (with reasonable limitations).

A few key tidbits from that exchange:

Separate note:

Workarounds until Ooba is async-ready

Meanwhile, I think what we'll look into is a multiprocess implementation that serializes access to the LLM hosted in Ooba. The front-end API will use some sort of message queuing or ticketing system to interface with the back end, while not blocking (so sleep/wake/poll-for-results pattern). This we can implement even if the Ooba API is strictly synchronous (i.e. blocks), and would allow us to build out in a way that enables easily dropping in async Ooba API access when that's ready.

matatonic commented 1 year ago
uogbuji commented 1 year ago

Thanks, @matatonic for the clarifications. My main aim was to capture all the facets of the conversation, but anything that narrows down the work to be done is great.

This morning I put together the workaround I mentioned above. Here is an example, all as one module. My next step will be to abstract the multiprocess/executor bits into the OgbujiPT library, for easy invocation by users. Async will become the preferred way to invoke OgbujiPT, since, frankly, it's the right way to do it. After that we can be discussing some of the relevant bits on the Ooba server side to chip away at.

@ChocolateEinstein, let me know how this code works for you.

import os
import asyncio
import concurrent.futures

from langchain import OpenAI

from ogbujipt.config import *
from ogbujipt.model_style.alpaca import prep_instru_inputs, ALPACA_PROMPT_TMPL

API_HOST = 'http://192.168.1.10'  # Points to local machine running Ooba + OpenAI API
os.environ['OPENAI_API_BASE'] = f'{API_HOST}:{DEFAULT_API_PORT}/v1'

# Set up the API connector
llm = OpenAI(temperature=0.1)

# Set up the prompt
# Note: Python's input keyword probably won't play well here 😉
msg = 'Good morning. How are you?'

instru_inputs = prep_instru_inputs(
    'Translate the following message to French',
    inputs=msg
    )

prompt = ALPACA_PROMPT_TMPL.format(instru_inputs=instru_inputs)

# Set up the async LLM invocation
def call_llm(prompt):
    '''
    Actual LLM request, to b executed in a separate process
    '''
    return llm(prompt)

# Could probably use something like tqdm.asyncio, if we wanted to be fancy
async def indicate_progress(pause):
    '''
    Progress indicator for the console. Just prints dots.
    '''
    while True:
        print('.', end='', flush=True)
        await asyncio.sleep(pause)

async def main():
    '''
    Schedule one task to do the long-running/blocking LLM request, and another
    to run a progress indicator in the background
    '''
    loop = asyncio.get_running_loop()
    executor = concurrent.futures.ProcessPoolExecutor()
    main_task = loop.run_in_executor(executor, call_llm, prompt)
    tasks = [
        main_task,
        asyncio.create_task(indicate_progress(0.5))
        ]
    done, pending = await asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED)
    print('Result: ', next(iter(done)).result())

# Important: if using multiprocessing via ProcessPoolExecutor
# in the main module, its entry point must be guarded, since the child Python processes will re-import it
# https://docs.python.org/3/library/multiprocessing.html#multiprocessing-safe-main-import
if __name__ == '__main__':
    # Launch the asyncio main loop, but only from the main module initial invocation
    asyncio.run(main())
uogbuji commented 1 year ago

A simpler version of the above is now possible, with the newly implemented ogbujipt.async_helper.schedule_llm_call(). See added demos/alpaca_multitask_fix_xml.py