Missing function - Githubissues

Matagi1996 commented 11 months ago

The tutoriels for LLM topic generation use textgeneration.py or openai, thouse classes have this function to insert topics and documents into a custom prompt.

def _create_prompt(self, docs, topic, topics): keywords = ", ".join(list(zip(*topics[topic]))[0])

    # Use the default prompt and replace keywords
    if self.prompt == DEFAULT_PROMPT:
        prompt = self.prompt.replace("[KEYWORDS]", keywords)

    # Use a prompt that leverages either keywords or documents in
    # a custom location
    else:
        prompt = self.prompt
        if "[KEYWORDS]" in prompt:
            prompt = prompt.replace("[KEYWORDS]", keywords)
        if "[DOCUMENTS]" in prompt:
            to_replace = ""
            for doc in docs:
                to_replace += f"- {doc}\n"
            prompt = prompt.replace("[DOCUMENTS]", to_replace)

    return prompt

It seems like this function is missing from the Langchain wrapper and therefore using a langchain pipeline will not replace the prompt keywords with DOCUMENTS/TOPICS

I will write my own wrapper for now, just wanted confirmation if this is the reason topics were not inserted into my prompt or if I am missing something crucial here in comparison to the other wrapers.

MaartenGr commented 11 months ago

LangChain works a bit differently from these other methods. As you can see in the source code here the prompts do not use the [DOCUMENTS] tag and instead will directly give LangChain the representative documents instead:

https://github.com/MaartenGr/BERTopic/blob/7d07e1e94e69be278f79a48d73602cdc4df0885f/bertopic/representation/_langchain.py#L171-L191

That does indeed mean that the documentation should be updated to properly describe this phenomenon.

matteomarjanovic commented 9 months ago

Does it mean that, currently, the LangChain representation model doesn't give the option to put keywords in the prompt, right?

MaartenGr commented 9 months ago

That is correct. It should be straightforward to implement yourself considering other models do have that option.

MaartenGr / BERTopic

Missing function #1652