airockchip / rknn-llm

Other
417 stars 36 forks source link

Is w4a16 supported by rk3588? #90

Open vincenzodentamaro opened 2 months ago

vincenzodentamaro commented 2 months ago

As per the title is it supported? @airockchip @waydong

Also please point me to the right documentation. Thanks

waydong commented 2 months ago

Only the RK3576 supports W4A16, You can refer to the documentation for more details:https://github.com/airockchip/rknn-llm/tree/main/doc

vincenzodentamaro commented 2 months ago

Thank you @waydong for the answer. I am also interested in knowing whether it is currently possible, or will be possible in future updates, to extract embedding vectors from the models running on the RK3588s. This functionality is crucial to build a Retrieval-Augmented Generation (RAG) system, and any guidance on this would be highly beneficial.

Additionally I am interested , for a commercial product, in order to incorporate context of previous answers, wether it is possoble to embedding context without the need to pass all previously generated text in subsequent queries. We are seeking advice on whether this capability exists or will be supported in future iterations of the rkllm library.

Thank you

openedev commented 2 months ago

@vincenzodentamaro RAG-can be possible to test if we have python deployment code, because many of the open RAG's are written in python.

waydong commented 2 months ago

Q1: we are currently supporting returning the last hidden layer, not sure if this is what you need. Q2: this is included in the plans for support.

lzz773751548 commented 2 months ago

Thank you @waydong for the answer. I am also interested in knowing whether it is currently possible, or will be possible in future updates, to extract embedding vectors from the models running on the RK3588s. This functionality is crucial to build a Retrieval-Augmented Generation (RAG) system, and any guidance on this would be highly beneficial.

Additionally I am interested , for a commercial product, in order to incorporate context of previous answers, wether it is possoble to embedding context without the need to pass all previously generated text in subsequent queries. We are seeking advice on whether this capability exists or will be supported in future iterations of the rkllm library.

Thank you

Hello, in the current internal development version, we have provided an interface to obtain the output of the last hidden layer as embedding information. Does this align with your expectations? If you have any additional reference materials regarding RAG or embeddings, we would greatly appreciate it if you could share them with us. Thank you!

vincenzodentamaro commented 2 months ago

Dear @lzz773751548 yes this is very precious. How could I have access to the current development version? Remember it is for a commercial product. Thanks