FlagOpen / FlagEmbedding

Retrieval and Retrieval-augmented LLMs
MIT License
5.95k stars 429 forks source link

FlagEmbedding and LlamaIndex #524

Open ammarmol opened 4 months ago

ammarmol commented 4 months ago

Hello, i am trying to use LlamaIndex and FlagEmbedding together but it is really difficult. Could you provide a simple example of it? Is there a possibility to train a FlagEmbedding model in python with the call of a function such a "model = FlagEmbedding.baai_general_embedding.finetune( output_dir="./", model_name_or_path="BAAI/bge-large-zh-v1.5", train_data="./result.jsonl" , learning_rate=1e-5, num_train_epochs=5, per_device_train_batch_size=8, dataloader_drop_last=True, normlized=True, temperature=0.02, query_max_len=64, passage_max_len=256, train_group_size=2, logging_steps=10, query_instruction_for_retrieval="" ) ? model.run()

staoxiao commented 4 months ago

Hi, thanks for your interest in our work. You can use HuggingfaceEmbedding to load bge model in LlamaIndex. And LlamaIndex has its training script that you can use. If you want to use FlagEmbedding to train model with python, a simple method is using the system command: os.system("torchrun --nproc_per_node {number of gpus} -m FlagEmbedding.baai_general_embedding.finetune.run ... ")