Open vincenzodentamaro opened 2 months ago
Only the RK3576 supports W4A16, You can refer to the documentation for more details:https://github.com/airockchip/rknn-llm/tree/main/doc
Thank you @waydong for the answer. I am also interested in knowing whether it is currently possible, or will be possible in future updates, to extract embedding vectors from the models running on the RK3588s. This functionality is crucial to build a Retrieval-Augmented Generation (RAG) system, and any guidance on this would be highly beneficial.
Additionally I am interested , for a commercial product, in order to incorporate context of previous answers, wether it is possoble to embedding context without the need to pass all previously generated text in subsequent queries. We are seeking advice on whether this capability exists or will be supported in future iterations of the rkllm library.
Thank you
@vincenzodentamaro RAG-can be possible to test if we have python deployment code, because many of the open RAG's are written in python.
Q1: we are currently supporting returning the last hidden layer, not sure if this is what you need. Q2: this is included in the plans for support.
Thank you @waydong for the answer. I am also interested in knowing whether it is currently possible, or will be possible in future updates, to extract embedding vectors from the models running on the RK3588s. This functionality is crucial to build a Retrieval-Augmented Generation (RAG) system, and any guidance on this would be highly beneficial.
Additionally I am interested , for a commercial product, in order to incorporate context of previous answers, wether it is possoble to embedding context without the need to pass all previously generated text in subsequent queries. We are seeking advice on whether this capability exists or will be supported in future iterations of the rkllm library.
Thank you
Hello, in the current internal development version, we have provided an interface to obtain the output of the last hidden layer as embedding information. Does this align with your expectations? If you have any additional reference materials regarding RAG or embeddings, we would greatly appreciate it if you could share them with us. Thank you!
Dear @lzz773751548 yes this is very precious. How could I have access to the current development version? Remember it is for a commercial product. Thanks
As per the title is it supported? @airockchip @waydong
Also please point me to the right documentation. Thanks