HKUDS / LightRAG

"LightRAG: Simple and Fast Retrieval-Augmented Generation"
https://arxiv.org/abs/2410.05779
MIT License
9.26k stars 1.14k forks source link

Query Answer Citations #239

Open kevinsosborne opened 1 week ago

kevinsosborne commented 1 week ago

LightRAG is a very positive advancement for more precise RAG answers. It would be very help that with those query responses, there can be a way to provide some or all citations - such as the document names referenced or page numbers of the document referenced. This would make much better for the user to easily verify the accuracy of the answer(s).

aiproductguy commented 1 week ago

Do you mind drawing out your idea further? Send me a link here or just use this excali board.

Working demo: https://lightrag-gui.streamlit.app/

Jaykumaran commented 1 week ago

@aiproductguy Your OpenAI API key is easily visible and its a extremely costly process, why you need to share your demo that uses your credits as publicly acessible. Do some workaround.

kevinsosborne commented 1 week ago

@aiproductguy Thank you for interest in my suggestion.

I am not a skilled software developer, but I think having citations as part of the response would really take LightRAG to the next level. Users need to be able to verify the answers and cross check the results to ensure it is trustworthy easily. Think about ChatGPT Web Search where is goes through all the news and webpages, and it has footnotes to the sentences to pinpoint where did this information get pulled from. Since these LLMs work best with Markdown and not pure PDFs, it seems that having at least the document file name as a footnote in the citation would be very beneficial. This is extremely important when handling a large corpse of PDFs say in the hundreds. As for drawing out the workflow, I am not sure on how to execute the idea. I imagine you might have to do it at the embedding stage such as chuncking so that the document name is captured with the chuck.

I do know that RagFlow.io does do citations kind of well, and since RagFlow was able to do it, I initially thought that LightRAG should be able to do it too.

Jaykumaran commented 1 week ago

Hello,

I think GraphRAG pipeline already has citations with a particular entity or relation. You may refer to that to adapt it to LightRAG. But yes it will be nice if it is available by default in LightRAG.

amirsa66 commented 1 week ago

Do you mind drawing out your idea further? Send me a link here or just use this excali board.

Working demo: https://lightrag-gui.streamlit.app/

@aiproductguy Would you provide the .py code for this demo?

LarFii commented 1 week ago

Currently, a parameter only_need_context is provided to return only the retrieved content without the answer. For the same query, since the context is cached, there is no additional overhead. In the future, we will work on improving the citation process to make it more detailed.

gdurifw commented 1 week ago

Hi @LarFii thanks for your response, but i dont understand the right way to use this parameter. You mean to run the same query two time One with parameter only_need_context a true and One a false? Can you provider an example of snippet code for the use case? Thanks a lot

amirsa66 commented 1 week ago

Hi @LarFii thanks for your response, but i dont understand the right way to use this parameter. You mean to run the same query two time One with parameter only_need_context a true and One a false? Can you provider an example of snippet code for the use case? Thanks a lot

this way:

# Create a QueryParam object with only_need_context=True to get the retrieved context
context_param = QueryParam(mode="naive", only_need_context=True, top_k=5, max_token_for_text_unit=1500, max_token_for_global_context=2000)
retrieved_context = rag.query(query_text, param=context_param)

# Now, perform the query to get the answer
answer_param = QueryParam(mode="naive", only_need_context=False, top_k=5, max_token_for_text_unit=1500, max_token_for_global_context=2000)
answer = rag.query(query_text, param=answer_param)

# Print the retrieved context and the answer
print("Retrieved Context:")
print(retrieved_context)
print("\nAnswer:")
print(answer)
kevinsosborne commented 1 week ago

Thank you all for the responses. I want to be clear what I mean for citations. For example, when I got a response, I would like to know what name of the file such as the pdf, Md file, work doc, etc that it is pulling that information from. The purpose is to be able to verify what source documents is the response being primarily generated from so that the user like myself can track back / cross check to that or those PDFs and determine if the statement(s) appear accurate or not. Basically I asking for the equivalent of “FOOTNOTES” like you would find in a word document.