Low Evaluation Speedup for Local Model

Dear authors, When I used the CRUD-RAG test on GPT-3.5-Turbo and designed it with 20 threads, the test could be completed within half an hour.

2024-07-10 14:57:44.167 | INFO | main::56 - Namespace(model_name='gpt-3.5-turbo', temperature=0.1, max_new_tokens=1280, data_path='data/crud_split/split_merged.json', shuffle=True, embedding_name='sentence-transformers/bge-base-zh-v1.5', embedding_dim=768, docs_path='data/80000_docs', docs_type='txt', chunk_size=128, chunk_overlap=0, construct_index=False, add_index=False, collection_name='docs_80k_chuncksize_128_0', retrieve_top_k=8, retriever_name='base', quest_eval=False, bert_score_eval=False, task='continuing_writing', num_threads=20, show_progress_bar=True, contain_original_data=False) LLM is explicitly disabled. Using MockLLM. 3%|████▊ | 63/2000 [01:09<20:02, 1.61it/s

However, when using the local model Qwen7b, it seems that concurrent threading doesn't make much of a difference; the GPU utilization is very low, and it's unclear how long the test will take to finish :(

2024-07-10 14:53:47.739 | INFO | main::56 - Namespace(model_name='qwen7b', temperature=0.1, max_new_tokens=1280, data_path='data/crud_split/split_merged.json', shuffle=True, embedding_name='sentence-transformers/bge-base-zh-v1.5', embedding_dim=768, docs_path='data/80000_docs', docs_type='txt', chunk_size=128, chunk_overlap=0, construct_index=False, add_index=False, collection_name='docs_80k_chuncksize_128_0', retrieve_top_k=8, retriever_name='base', quest_eval=False, bert_score_eval=False, task='continuing_writing', num_threads=20, show_progress_bar=True, contain_original_data=False) Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Loading checkpoint shards: 100%|██████████████████████████████████████████████████| 4/4 [00:10<00:00, 2.59s/it] LLM is explicitly disabled. Using MockLLM. 0%| | 0/2000 [00:00<?, ?it/s]

UPDATE: 1%|█ | 24/2000 [36:50<28:58:46, 52.80s/it]

+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 907595 C python 8984MiB | | 0 N/A N/A 914610 C python 2904MiB | | 1 N/A N/A 907595 C python 8008MiB | | 2 N/A N/A 907595 C python 8014MiB | | 3 N/A N/A 907595 C python 8014MiB | | 4 N/A N/A 907595 C python 8014MiB | | 5 N/A N/A 907595 C python 7982MiB | | 6 N/A N/A 907595 C python 10150MiB | | 7 N/A N/A 907595 C python 452MiB | +---------------------------------------------------------------------------------------+

IAAR-Shanghai / CRUD_RAG

Low Evaluation Speedup for Local Model #13