Generate takes very long time to finish

michalrzak commented 1 year ago

I have created the following application:

from cot import Collection
import os

os.environ["OPENAI_API_KEY"] = "<my_api_key>"

dataset = Collection(["med_qa"])
config = {
    "instruction_keys": ['qa-01'],
    "cot_trigger_keys": ['kojima-01'],
    "answer_extraction_keys": ['kojima-A-D'],
    "api_service": "openai",
    "engine": "text-davinci-003",
    "temperature": 0.35,
    "max_tokens": 512,
    "verbose": False,
    "warn": True
}

dataset_subset = dataset.select(split="train", number_samples=20, random_samples = True, seed = 0)

dataset_subset.generate(config = config)

Running the application, however, takes a very long time to finish (>> 30 minutes). During this time it progresses through the dataset, but each step takes a long time to finish and it seems like the application lags out at some point and doesn't continue generating.

After a while, the following message/warning appears:

Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised APIError: Bad gateway. {"error":{"code":502,"message":"Bad gateway.","param":null,"type":"cf_bad_gateway"}} 502 {'error': {'code': 502, 'message': 'Bad gateway.', 'param': None, 'type': 'cf_bad_gateway'}} {'Date': 'Fri, 26 May 2023 13:26:41 GMT', 'Content-Type': 'application/json', 'Content-Length': '84', 'Connection': 'keep-alive', 'X-Frame-Options': 'SAMEORIGIN', 'Referrer-Policy': 'same-origin', 'Cache-Control': 'private, max-age=0, no-store, no-cache, must-revalidate, post-check=0, pre-check=0', 'Expires': 'Thu, 01 Jan 1970 00:00:01 GMT', 'Server': 'cloudflare', 'CF-RAY': '7cd652355b51c2c3-VIE', 'alt-svc': 'h3=":443"; ma=86400'}.
Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised APIError: Bad gateway. {"error":{"code":502,"message":"Bad gateway.","param":null,"type":"cf_bad_gateway"}} 502 {'error': {'code': 502, 'message': 'Bad gateway.', 'param': None, 'type': 'cf_bad_gateway'}} {'Date': 'Fri, 26 May 2023 13:31:56 GMT', 'Content-Type': 'application/json', 'Content-Length': '84', 'Connection': 'keep-alive', 'X-Frame-Options': 'SAMEORIGIN', 'Referrer-Policy': 'same-origin', 'Cache-Control': 'private, max-age=0, no-store, no-cache, must-revalidate, post-check=0, pre-check=0', 'Expires': 'Thu, 01 Jan 1970 00:00:01 GMT', 'Server': 'cloudflare', 'CF-RAY': '7cd65a493845c2c3-VIE', 'alt-svc': 'h3=":443"; ma=86400'}.

michalrzak commented 1 year ago

NOTE: reducing the number of samples SOMETIMES lets the method finish. (tried with 1 and 5)

KonstantinHebenstreit commented 1 year ago

Hey! would you try testing our code by changing two lines in the config: "api_service": "mock_api", "engine": "",

Then it just runs our code and inserts a mock message. That should be very fast, within seconds, if not I will have a more detailled look at it.

For now, I believe the problem is at the openai API... Maybe just try again in a few hours or tomorrow.

michalrzak commented 1 year ago

Sorry for getting back this late.

I tried rerunning the code with the mock API, and this works almost instantly.

And it really seems that last week the issue was with openAI as the generation works fairly fast today. (finishing 5 samples in ~1 minute)

OpenBioLink / ThoughtSource

Generate takes very long time to finish #129