MagnivOrg / prompt-layer-library

🍰 PromptLayer - Maintain a log of your prompts and OpenAI API requests. Track, debug, and replay old completions.
https://www.promptlayer.com
Apache License 2.0
485 stars 44 forks source link

Add integration tests to promptlayer #6

Closed Jped closed 1 year ago

Jped commented 1 year ago

We are getting to a point where we need some testing to make sure we don't break things when pushing updates onto the library.

The testing suite should test the most common use cases of the library:

  1. basic OpenAI
    import promptlayer
    openai = promptlayer.openai
    openai.Completion.create(engine="text-davinci-003", prompt=context, max_tokens=500)
  2. OpenAI stream:
    openai.Completion.create(engine="text-davinci-003", prompt=context, max_tokens=500, stream=True)
  3. OpenAI async:
    openai.Completion.acreate(engine="text-davinci-003", prompt=context, max_tokens=500)
  4. OpenAI async + stream:
    openai.Completion.acreate(engine="text-davinci-003", prompt=context, max_tokens=500, stream=True)
  5. basic Langchain OpenAI:
    from promptlayer.langchain.llms import OpenAI
    llm = openAI()
    llm("hello world")
  6. Langchain Async OpenAI:
    async def async_generate(llm):
    resp = await llm.agenerate(["Hello, how are you?"])
  7. OpenAI ChatCompletion
    response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"},
        {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
        {"role": "user", "content": "Where was it played?"}
    ]
    )
abubakarsohail commented 1 year ago

@Jped since we are depending on OpenAPI as a third party service, we only have a limited number of options in terms of integration testing.

  1. Either we don't do integration tests and rely on unit tests and manual testing to ensure integrity of our library.
  2. We call the live OpenAPI to properly test integration.
  3. We record responses from OpenAPI and assert attributes accordingly.
  4. We make a lightweight OpenAPI mock server (this option is similar to third one but here we are making actual HTTP network calls.)

Each option has its own tradeoffs.

  1. Not testing the integration with OpenAPI and relying on manual or unit testing does sound easier but as the scale grows, we might not be able to ensure integrity of the application.
  2. Calling a live production API can be expensive both in terms of cost and time. The tests will take longer to run. Do mind any costs that will be incurred during testing.
  3. This may also sound easy but this is not proper testing. We are essentially doing 1==1 because we are recording responses ourselves and then asserting the same responses.
  4. This is similar to third option but the difference is that here we are doing actual network http requests. Any behavior related to I/O will also be tested properly.
Jped commented 1 year ago

I think we should do option 2 for now, if things get costly we can do a combination of that and unit tests.