Closed tenzin3 closed 2 weeks ago
Initially generated 2 questions per each chunk but then later observed the questions generated from same chunk were similar so took only first one.
book = "Art of Happiness at work by Dalai Lama" embedding model = Alibaba-NLP/gte-large-en-v1.5 question generation model = voidful/context-only-question-generator Chose chunk_size = 500 and chunk_overlap=100 number of chunks = 170 number of questions generated = 170. number of context given = 2-5(top 2 and 5) embedding similarity checker for faithfullness and relevancy = cosine similarity
prompt = f"""
You are a spiritual leader. Your students will seek your guidance. Answer their questions based solely on the provided context. If the context does not contain the information needed to answer a question, respond with "I don't know." Context: {context} Question: {question}
"""
max number of new tokens = 500 temperature = 0
response_time is in seconds
The microsoft/Phi-3-mini-4k-instruct-gguf model hasn't yet been integrated into the Hugging Face Transformers module. When I downloaded the model and ran it on a Vast.ai server, the average inference time was around 90 seconds per request (average after running on first five questions), which is unsatisfactory.
When testing the google/gemma-7b model with the simple context "Apple is green in colour." and the question "What is the colour of Apple?", the output was inaccurate and demonstrated hallucination. This suggests that the gemma-7b model is not suitable for instructional chatbot applications, as it struggles with generating precise and reliable responses.
For question "How long does it take to rewire one's mind?". llama3 answer with top 4 context :> According to the context, it takes repeated effort and time to rewire one's mind. The passage states, ""it takes repeated effort. And it takes time. For this kind of analytical meditation to work, one needs deep and sustained reflection on these alternative ways of viewing one's situation."" It also mentions that it can be a long and difficult process to reshape one's attitudes and outlook, and that many people resist giving up their misery, which can make it a challenging and time-consuming process. llama3 answer with top 5 context:>I don't know. The text does not provide a specific answer to this question. It mentions that it takes repeated effort and time to reshape one's attitudes and outlook, but it does not provide a specific timeframe for rewiring one's mind.
For answer with 5 context, it is responding with a negative sense.
For question: "Did the Dalai Lama have a shaven head?" llama3 answer with top2 context: > "I don't know. The context does not mention the Dalai Lama's hair or head, so I cannot provide an answer to this question."
The context is given using llama index retrieval method, the user don't need to know about the context information. Or we could answer with the answer is not present in the Dalai Lama books.
For question: "Is there a program in India where prisoners are being taught ingestionmeditation?" phi3 answer with top2 and 3 context: " Yes, there is a program in India where prisoners are being taught meditation." phi3 answer with top 4 context: "I don't know." phi3 answer with top5 context: " No, the context provided mentions a program in India where prisoners are being taught meditation, not ingestion meditation."
From the above table and observations from their responses, using microsoft/Phi-3-mini-128k-instruct
seems like a better choice based on the following reasons:
Currently, the experiment is run and timed for one question only, but later on, for a chatbot application, it is essential to carry on context and information from previous conversations. Having a larger context and better speed would be a huge plus.
Both language models were equipped with the same LLaMA index retrieval tool, and clearly, the Phi-3 model has been able to find answers much better.
Question: "How many human beings are there?"
LLaMA-3 Answers:
Phi-3 Answers:
The Phi-3 model demonstrates better contextual understanding and retrieval of relevant information.
LLaMA-3 tends to provide longer responses even when the answer is unknown, which may contribute to higher relevance and faithfulness scores due to the repetition of the question text.
Example Question: "Who met with President George as a statesman?"
Answer: "I don't know. There is no mention of President George or any meeting with him in the provided context."
LLaMA-3's tendency to repeat the question text leads to higher scores when it does not have a definitive answer from the context.
Based on these factors, the microsoft/Phi-3-mini-128k-instruct
model is recommended for its better speed, larger context handling, and superior information retrieval capabilities.
template = f"""
You are a chatbot designed to answer questions using content from the Dalai Lama's books.
Follow these guidelines:
- Answer the question based on the given contexts (some of which might be irrelevant).
- Be elaborate and precise.
- Answer directly, without adding any extra words.
- Be careful of the language, ensuring it is respectful and appropriate.
- If you do not have a proper answer from the context, respond with "I dont have enough data to provide an answer."
- Do not give a response longer than 3000 tokens.
Contexts: {context}
Question: {question}
"""
Description:
Selecting one of the following model for the final response generation.
Criteria
Excepted Output:
final LLM which is best suited for out RAG application
Implementation Steps