Open jeaninejuliettes opened 1 month ago
Thank you for sharing this. I see that you opened a similar issue (https://github.com/MaartenGr/BERTopic/discussions/2177). Are you alright with closing that one? To me, they seem like duplicates.
With respect to your issue, the idea of content violation was mentioned in earlier issues and addressed with the following:
Which makes it rather surprising that you get this issue. It may be that the API of OpenAI was updated and now always returns "content" but I'm not sure. Either way, simply doing an additional check here makes sense to me.
No, I'm sorry this was unclear, for this specific issue I don't get any errors regarding content violation. It simply seems that the result of response.choices[0].message returns None, which then produces an error, since you can't use strip on a NoneType object. I don't know when/why this happens, but it doesnt seem to be the result of an error produced by the API, since the response object exists.
Also the reason why I created a separate "issue" (discussion/question) for the content violation, since I grasped from the code that that supposed to have been fixed, but I'm still running into this unfortunately. But that is a discussion for the #2177 as far as I'm concerned. They don't seem to be related. (as far as I can tell)
I think that this:
I ran into issues when using the OpenAI representation as it sometimes produces a content of None, which then produced an error when trying to run: label = response.choices[0].message.content.strip().replace("topic: ", "")
and this:
response.choices[0].message returns None
contradict with one another. The reason why I think that is because you shouldn't be able to reach label = ...
at all because there is this check (which is used for content violation):
Thus, response.choices[0].message returns None
cannot be the case because there is check to see whether it contains the attribute "content", right? Or did you mean that "content" returns None? If so, then the API of OpenAI servers might have changed since it didn't show that behavior before.
Looking through the issues, it seems that this was mentioned before and a PR that hasn't been updated in a couple of months. API changes might relate here but also the reason why you get a None, which typically is a content violation issue. Based on what I see, I'm convinced they relate to one another since the None you get is typically some sort of content violation issue.
Yeas, I mean that the content returns None, the response exists, but the content its returning is empty, the element content does exist in the response object. Ah, I didn't see that issue (apologies), but it is the exact error message I'm seeing. And reading through the issue, it looks quite similar. But the PR is inactive?
Funny thing is, I'm still also getting content violation errors, but let's keep that out of this discussion for now ;)
It does seem to be inactive and unfortunately, I currently do not have the time to look it over. I would also be alright with a small PR just making sure it gives no error. Any additional work can be done later.
Ok, I can look into that!
Have you searched existing issues? š
Desribe the bug
I ran into issues when using the OpenAI representation as it sometimes produces a content of None, which then produced an error when trying to run: label = response.choices[0].message.content.strip().replace("topic: ", "")
Which makes sense, since the content is not a string. I'm unable to generate a minimal example since this is due to the output of OpenAI GPT.
I see two ways to work around this, but both have their own downsides/impact on the results, maybe anyone else sees better option:
For now I fixed it by creating an inherited customOpenAI representation class within my script where I used the second option as a solution.
Reproduction
BERTopic Version
0.16.4