ContextualAI / gritlm

Generative Representational Instruction Tuning
https://arxiv.org/abs/2402.09906
MIT License
571 stars 41 forks source link

Unable to load trained model #27

Open deepakkr-singh opened 7 months ago

deepakkr-singh commented 7 months ago

I trained embedding model on toy dataset as suggested on the repo

torchrun --nproc_per_node 1 -m training.run --output_dir test_path --model_name_or_path openaccess-ai-collective/tiny-mistral --train_data training/toy_data_instruct/toy_data_embedding.jsonl --learning_rate 1e-5 --num_train_epochs 5 --per_device_train_batch_size 2 --dataloader_drop_last True --normalized True --temperature 0.02 --query_max_len 32 --passage_max_len 128 --train_group_size 2 --mode embedding --attn cccc --save_strategy epoch

When I tried to load that model I am facing issues:


!pip install gritlm
from gritlm import GritLM
model = GritLM("gritlm/test_path", torch_dtype="auto", mode='embedding')
instruction = "Given a scientific paper title, retrieve the paper's abstract"
queries = ['Bitcoin: A Peer-to-Peer Electronic Cash System', 'Generative Representational Instruction Tuning']
documents = [
    "A purely peer-to-peer version of electronic cash would allow online payments to be sent directly from one party to another without going through a financial institution. Digital signatures provide part of the solution, but the main benefits are lost if a trusted third party is still required to prevent double-spending. We propose a solution to the double-spending problem using a peer-to-peer network. The network timestamps transactions by hashing them into an ongoing chain of hash-based proof-of-work, forming a record that cannot be changed without redoing the proof-of-work. The longest chain not only serves as proof of the sequence of events witnessed, but proof that it came from the largest pool of CPU power. As long as a majority of CPU power is controlled by nodes that are not cooperating to attack the network, they'll generate the longest chain and outpace attackers. The network itself requires minimal structure. Messages are broadcast on a best effort basis, and nodes can leave and rejoin the network at will, accepting the longest proof-of-work chain as proof of what happened while they were gone.",
    "All text-based language problems can be reduced to either generation or embedding. Current models only perform well at one or the other. We introduce generative representational instruction tuning (GRIT) whereby a large language model is trained to handle both generative and embedding tasks by distinguishing between them through instructions. Compared to other open models, our resulting GritLM 7B sets a new state of the art on the Massive Text Embedding Benchmark (MTEB) and outperforms all models up to its size on a range of generative tasks. By scaling up further, GritLM 8X7B outperforms all open generative language models that we tried while still being among the best embedding models. Notably, we find that GRIT matches training on only generative or embedding data, thus we can unify both at no performance loss. Among other benefits, the unification via GRIT speeds up Retrieval-Augmented Generation (RAG) by > 60% for long documents, by no longer requiring separate retrieval and generation models. Models, code, etc. are freely available at https://github.com/ContextualAI/gritlm."
]

def gritlm_instruction(instruction):
    return "<|user|>\n" + "\n<|embed|>\n"

# No need to add instruction for retrieval documents
d_rep = model.encode(documents, instruction=gritlm_instruction(""))
q_rep = model.encode(queries, instruction=gritlm_instruction(""))```

> TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_2764/538718586.py in <module>
     10 
     11 # No need to add instruction for retrieval documents
---> 12 d_rep = model.encode(documents, instruction=gritlm_instruction(""))
     13 q_rep = model.encode(queries, instruction=gritlm_instruction(""))

~/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py in decorate_context(*args, **kwargs)
    113     def decorate_context(*args, **kwargs):
    114         with ctx_factory():
--> 115             return func(*args, **kwargs)
    116 
    117     return decorate_context

~/Medical_Form_Analysis/GritLM/gritlm/gritlm/gritlm.py in encode(self, sentences, batch_size, max_length, instruction, embed_instruction, get_cache, convert_to_tensor, recast, add_special_tokens, **kwargs)
    131             if get_cache:
    132                 inputs['use_cache'] = True
--> 133             outputs = (
    134                 getattr(self.model, self.embedding_attr) if self.embedding_attr else self.model
    135             )(**inputs)

~/.local/lib/python3.10/site-packages/torch/nn/modules/module.py in _wrapped_call_impl(self, *args, **kwargs)
   1509             return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1510         else:
-> 1511             return self._call_impl(*args, **kwargs)
   1512 
   1513     def _call_impl(self, *args, **kwargs):

~/.local/lib/python3.10/site-packages/torch/nn/modules/module.py in _call_impl(self, *args, **kwargs)
   1518                 or _global_backward_pre_hooks or _global_backward_hooks
   1519                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1520             return forward_call(*args, **kwargs)
   1521 
   1522         try:

TypeError: MistralModel.forward() got an unexpected keyword argument 'is_causal'

Kindly help
Muennighoff commented 7 months ago

Explained here: https://github.com/ContextualAI/gritlm/issues/24