MaartenGr / KeyBERT

Minimal keyword extraction with BERT
https://MaartenGr.github.io/KeyBERT/
MIT License
3.47k stars 344 forks source link

Support for OpenAI >= 1 #189

Closed MaartenGr closed 9 months ago

lfoppiano commented 9 months ago

I've tried to test, but I've got something else:

pip uninstall keybert
pip install -U git+https://github.com/MaartenGr/KeyBERT@openai_fix

Then I tried the following:

client = openai.OpenAI()
lc_chatgpt = OpenAI(client)

I've specified the model because the default is gpt-3.5-turbo-instruct, which sounds old

here the output:

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Traceback (most recent call last):
  File "/Users/lfoppiano/development/projects/concepts-visualisation/concepts_visualisation/openalex/keyword/extract_keywords_keyllm.py", line 117, in <module>
    process_single(input_json, output_json)
  File "/Users/lfoppiano/development/projects/concepts-visualisation/concepts_visualisation/openalex/keyword/extract_keywords_keyllm.py", line 41, in process_single
    keywords_abstracts = kw_model.extract_keywords(abstracts, embeddings=embeddings_abstracts, threshold=0.9)
  File "/Users/lfoppiano/anaconda3/envs/nii/lib/python3.10/site-packages/keybert/_llm.py", line 94, in extract_keywords
    out_cluster_keywords = self.llm.extract_keywords(
  File "/Users/lfoppiano/anaconda3/envs/nii/lib/python3.10/site-packages/keybert/llm/_openai.py", line 189, in extract_keywords
    keywords = response["choices"][0]["text"].strip()
TypeError: 'Completion' object is not subscriptable

If I specify to use model="gpt-3.5-turbo"

lc_chatgpt = OpenAI(client, model="gpt-3.5-turbo")

I get the following error:

Traceback (most recent call last):
  File "/Users/lfoppiano/development/projects/concepts-visualisation/concepts_visualisation/openalex/keyword/extract_keywords_keyllm.py", line 116, in <module>
    process_single(input_json, output_json)
  File "/Users/lfoppiano/development/projects/concepts-visualisation/concepts_visualisation/openalex/keyword/extract_keywords_keyllm.py", line 40, in process_single
    keywords_abstracts = kw_model.extract_keywords(abstracts, embeddings=embeddings_abstracts, threshold=0.9)
  File "/Users/lfoppiano/anaconda3/envs/nii/lib/python3.10/site-packages/keybert/_llm.py", line 94, in extract_keywords
    out_cluster_keywords = self.llm.extract_keywords(
  File "/Users/lfoppiano/anaconda3/envs/nii/lib/python3.10/site-packages/keybert/llm/_openai.py", line 188, in extract_keywords
    response = self.client.completions.create(model=self.model, prompt=prompt, **self.generator_kwargs)
  File "/Users/lfoppiano/anaconda3/envs/nii/lib/python3.10/site-packages/openai/_utils/_utils.py", line 299, in wrapper
    return func(*args, **kwargs)
  File "/Users/lfoppiano/anaconda3/envs/nii/lib/python3.10/site-packages/openai/resources/completions.py", line 559, in create
    return self._post(
  File "/Users/lfoppiano/anaconda3/envs/nii/lib/python3.10/site-packages/openai/_base_client.py", line 1055, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
  File "/Users/lfoppiano/anaconda3/envs/nii/lib/python3.10/site-packages/openai/_base_client.py", line 834, in request
    return self._request(
  File "/Users/lfoppiano/anaconda3/envs/nii/lib/python3.10/site-packages/openai/_base_client.py", line 877, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'error': {'message': 'This is a chat model and not supported in the v1/completions endpoint. Did you mean to use v1/chat/completions?', 'type': 'invalid_request_error', 'param': 'model', 'code': None}}
MaartenGr commented 9 months ago

@lfoppiano Have you set chat=True? That is necessary to use a chat model compared to a completion model. You can read more about this in the docstrings.

lfoppiano commented 9 months ago

Not initially. Sorry. The completion chat is deprecated, anyway.

I just tried:

client = openai.OpenAI()
lc_chatgpt = OpenAI(client, model="gpt-3.5-turbo", chat=True)

I get a similar error:

Traceback (most recent call last):
  File "/Users/lfoppiano/development/projects/concepts-visualisation/concepts_visualisation/openalex/keyword/extract_keywords_keyllm.py", line 116, in <module>
    process_single(input_json, output_json)
  File "/Users/lfoppiano/development/projects/concepts-visualisation/concepts_visualisation/openalex/keyword/extract_keywords_keyllm.py", line 40, in process_single
    keywords_abstracts = kw_model.extract_keywords(abstracts, embeddings=embeddings_abstracts, threshold=0.9)
  File "/Users/lfoppiano/anaconda3/envs/nii/lib/python3.10/site-packages/keybert/_llm.py", line 94, in extract_keywords
    out_cluster_keywords = self.llm.extract_keywords(
  File "/Users/lfoppiano/anaconda3/envs/nii/lib/python3.10/site-packages/keybert/llm/_openai.py", line 181, in extract_keywords
    keywords = response["choices"][0]["message"]["content"].strip()
TypeError: 'ChatCompletion' object is not subscriptable

Perhaps, using ChatCompletion, the responses should be extracted using response.choices instead of response['choices']

MaartenGr commented 9 months ago

@lfoppiano Thanks for trying it out! It seems that there is still some issue which I believe I just fixed. Could you test it?

lfoppiano commented 9 months ago

It works! 👍

Thanks!