citadel-ai / langcheck

Simple, Pythonic building blocks to evaluate LLM applications.
https://langcheck.readthedocs.io/en/latest/index.html
MIT License
186 stars 17 forks source link

Async call of OpenAISimilarityScorer is slower than sync version #160

Open taniokay opened 1 month ago

taniokay commented 1 month ago

From PR #159:

import time

from dotenv import load_dotenv

import langcheck
from langcheck.metrics.eval_clients import (
    AzureOpenAIEvalClient,
)

load_dotenv()

azure_openai_client = AzureOpenAIEvalClient(
    "gpt-4-turbo", "text-embedding-ada-002", use_async=True
)

ideals = ["Apple"] * 100
texts = [f"Hello no {i}" for i in range(100)]

start = time.time()
async_results = langcheck.metrics.semantic_similarity(
    texts, ideals, eval_model=azure_openai_client
)
end = time.time()
print(f"Async Time: {end - start}")

azure_openai_client = AzureOpenAIEvalClient(
    "gpt-4-turbo", "text-embedding-ada-002", use_async=False
)

start = time.time()
sync_results = langcheck.metrics.semantic_similarity(
    texts, ideals, eval_model=azure_openai_client
)
end = time.time()
print(f"Sync Time: {end - start}")

With the default behavior (batch_size = 8), the elapsed time was like

Async Time: 52.09948110580444
Sync Time: 7.80295991897583

I also tested by updating the batch_size to 100, and the result is

Async Time: 8.260644912719727
Sync Time: 8.29215955734253
Vela-zz commented 4 days ago

It seems in BasicScorer, the _embed func when using Async still process request sequentially not in parallel, and it also take httpx some time to send a retry request when it triggerd async client twice in a short time. can get a log like

2024-11-11 16:52:57,114 - langcheck.metrics.eval_clients._openai - INFO - Azure Async Called.
2024-11-11 16:52:57,214 - httpx - INFO - HTTP Request: POST https://xxxxxx.openai.azure.com/openai/deployments/text-embedding-ada-002/embeddings?api-version=2023-05-15 "HTTP/1.1 200 OK"
2024-11-11 16:52:57,226 - langcheck.metrics.eval_clients._openai - INFO - AsyncAzure embedding Response takes 0.11207866668701172 s
2024-11-11 16:52:57,731 - langcheck.metrics.eval_clients._openai - INFO - Azure Async Called.
2024-11-11 16:52:57,733 - openai._base_client - INFO - Retrying request to /embeddings in 0.765730 seconds
2024-11-11 16:52:58,562 - httpx - INFO - HTTP Request: POST https://xxxxxx.openai.azure.com/openai/deployments/text-embedding-ada-002/embeddings?api-version=2023-05-15 "HTTP/1.1 200 OK"
2024-11-11 16:52:58,566 - langcheck.metrics.eval_clients._openai - INFO - AsyncAzure embedding Response takes 0.8344950675964355 s
2024-11-11 16:52:58,569 - langcheck.metrics.eval_clients._openai - INFO - Azure Async Called.
2024-11-11 16:52:58,573 - openai._base_client - INFO - Retrying request to /embeddings in 0.934265 seconds
2024-11-11 16:52:59,582 - httpx - INFO - HTTP Request: POST https://xxxxxx.openai.azure.com/openai/deployments/text-embedding-ada-002/embeddings?api-version=2023-05-15 "HTTP/1.1 200 OK"
2024-11-11 16:52:59,590 - langcheck.metrics.eval_clients._openai - INFO - AsyncAzure embedding Response takes 1.0200066566467285 s
2024-11-11 16:53:00,094 - langcheck.metrics.eval_clients._openai - INFO - Azure Async Called.
2024-11-11 16:53:00,096 - openai._base_client - INFO - Retrying request to /embeddings in 0.770774 seconds
2024-11-11 16:53:00,928 - httpx - INFO - HTTP Request: POST https://xxxxxx.openai.azure.com/openai/deployments/text-embedding-ada-002/embeddings?api-version=2023-05-15 "HTTP/1.1 200 OK"
2024-11-11 16:53:00,936 - langcheck.metrics.eval_clients._openai - INFO - AsyncAzure embedding Response takes 0.8413498401641846 s
2024-11-11 16:53:00,937 - langcheck.metrics.eval_clients._openai - INFO - Azure Async Called.
2024-11-11 16:53:00,939 - openai._base_client - INFO - Retrying request to /embeddings in 0.790448 seconds
2024-11-11 16:53:01,798 - httpx - INFO - HTTP Request: POST https://xxxxxx.openai.azure.com/openai/deployments/text-embedding-ada-002/embeddings?api-version=2023-05-15 "HTTP/1.1 200 OK"
2024-11-11 16:53:01,801 - langcheck.metrics.eval_clients._openai - INFO - AsyncAzure embedding Response takes 0.8630878925323486 s
2024-11-11 16:53:02,304 - langcheck.metrics.eval_clients._openai - INFO - Azure Async Called.

https://github.com/citadel-ai/langcheck/blob/d9ff16f1da73f461669dc753bb7aa52c991355df/src/langcheck/metrics/scorer/_base.py#L112

I can push a small fix on this.

liwii commented 4 days ago

Thanks for the analysis Vela! It is surprising to me that the calls are actually sequential...

It would be great if you could push a fix, thank you so much!!

Vela-zz commented 2 days ago

Thanks for the analysis Vela! It is surprising to me that the calls are actually sequential...

It would be great if you could push a fix, thank you so much!!

becuase async.run will cost a lot of time maintain the lifecycle of a new event loop at every time it get called, so all these _embed are executed in different event loop, it will looks like a squentially function calling.