OpenNMT / CTranslate2

Fast inference engine for Transformer models
https://opennmt.net/CTranslate2
MIT License
3.27k stars 287 forks source link

Example for XLM-RoBERTa #1432

Closed sharhabeel closed 1 year ago

sharhabeel commented 1 year ago

Hello Team,

Is there any example for XLM-RoBERTa? I am trying to use "joeddav/xlm-roberta-large-xnli" from HuggingFace.

Thank you!

guillaumekln commented 1 year ago

Hi,

The usage should be similar to the BERT example. Have you looked into this?

BBC-Esq commented 1 year ago

Hello guillaumekln,

I'm new to programming and am struggling to understand ctranslate2. I was wondering if you could help me real quick with some resources? I want to create a program that includes a chat bot feature (with memory) along with a vector database for retrieval augmented generation, but most importantly, that relies on ctranslate2. I couldn't figure out how to do this solely with ctranslate2 so I'm learning "hf_hub_ctranslate2"...but even then I'm having difficulty. I figured out how to convert LLMs to the ctranslate2 format and run them in chat mode...but I can't seem to figure out how to convert an "embedding" model to the ctranslate2 format and use it to create embeddings for my chromadb vector database...

here's my current code if you have the time...but I'm wondering you have anymore examples either using hf_hub_ctranslate2 (wrapper) or ctranslate2 directly...For the sake of brevity, I'll paste only the relevant portion:


# Wrapper class
class EmbeddingWrapper:
    def __init__(self, model):
        self.model = model

    def embed_documents(self, texts):
        # Filter out anything from the chunks that is not a string
        texts = [text for text in texts if isinstance(text, str)]

        # Generate embeddings for the chunks created
        embeddings = self.model.encode(
            texts,
            batch_size=20,
            convert_to_numpy=True,
            normalize_embeddings=True,
        )
        return embeddings

def main():
    logging.info(f"Loading documents from {SOURCE_DIRECTORY}")
    documents = load_documents(SOURCE_DIRECTORY)
    # 3 - Breaks all the text extracted from the PDF into chunks
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1800, chunk_overlap=900)
    texts = text_splitter.split_documents(documents)
    # 4 - Logs the number of documents that were loaded and the number of chunks created
    logging.info(f"Loaded {len(documents)} documents from {SOURCE_DIRECTORY}")
    logging.info(f"Split into {len(texts)} chunks of text")

    # 5 - Initialize the CT2SentenceTransformer model in order to create embeddings from the chunks
    model = CT2SentenceTransformer(
        local_model_path, compute_type="float16", device="cuda"
    )

    # 6 - Instance of "EmbeddingWrapper" class is created (it contains a method named "embed_documents" to make CT2SentenceTransformer work since CT2SentenceTransformer only produces a numpy array
    embeddings = EmbeddingWrapper(model)

    # 7 - Creates an instance of "Chroma" database using the text chunks ("texts") and their embeddings (likely using the Chroma.from_documents" method
    db = Chroma.from_documents(
        texts,
        embeddings,
        persist_directory=PERSIST_DIRECTORY,
        client_settings=CHROMA_SETTINGS,
    )
    # 8 - Persists the database
    db.persist()
sharhabeel commented 1 year ago

@guillaumekln Thank you for the reference!

I tried that, the only issue that I don't know in this example how can I pass the a candidate_labels for zero-shot classification to the model through the Encoder.

In Transformers Pipleline it is straight forward like this:

classifier = pipeline("zero-shot-classification", model="ModelPath") output=classifier(sequence_to_classify, candidate_labels)

But I am struggling how to implement this code with the BERT example

I appreciate the help, Thanks

BBC-Esq commented 1 year ago

I just need to know which class I need to use to use one of the embedding models like instruct-xl or bge-large-en? Also, do I need to convert the model to ctranslate2 format? Currently, I have a script that doesn't do that but uses hf_hub_ctranslate to somehow make it work...

guillaumekln commented 1 year ago

@sharhabeel I'm not very familiar with this task, but I think you can compute the logits as shown in the BERT example, and then reimplement the postprocessing that is applied by the "zero-shot-classification" pipeline:

https://github.com/huggingface/transformers/blob/b487096b02307cd6e0f132b676cdcc7255fe8e74/src/transformers/pipelines/zero_shot_classification.py#L211

It should also be possible to create a custom pipeline class that inherits from ZeroShotClassificationPipeline and which calls the CTranslate2 model instead of the Transformers model. You can get inspired by this example which transparently uses CTranslate2 for a SentenceTransformer model:

https://gist.github.com/guillaumekln/fb125fc3eb108d1a304b7432486e712f

@BBC-Esq Please create different issues for different questions or bugs. Here the issue is about using XLM-RoBERTa in zero-shot classification task.

sharhabeel commented 1 year ago

@guillaumekln Thank you!