Open Pevernow opened 1 year ago
Hi @Pevernow , thanks for your message. For the moment, RWKV is not on our roadmap. However, we welcome external contributions and if you are willing to contribute an implementation of RWKV, we could evaluate it and, eventually, merge it into TensorRT-LLM. Would you be interested in contributing?
Maybe this is a little difficult for me. But I'll try to find another developer to do it.
Hi, I'd like to work on it. Should I open an issue for proposal before starting it?
Hi, I'd like to work on it. Should I open an issue for proposal before starting it?
Of course, it depends on your preference. Thank you for your contribution to the community.
Hey, I need help in rwkv support in #384 . I would appreciate it if anyone can help me.
In the model forward, ind = arange(T-1, -1, self.dtype)
is necessary, where T
is a variable depending on the input shape. When building the model, T
is deduced as -1
. Therefore the building will fail. Any idea to deal with this case? @byshiue @jdemouth-nvidia
@AsakusaRinne For dynamic shape, you should use shape(x, -1)
, instead of x.shape[-1]
to get a dim of a tensor.
Please try:
T = shape(q, -1)
xxx
ind = arange(T-1, -1, self.dtype)
@AsakusaRinne For dynamic shape, you should use
shape(x, -1)
, instead ofx.shape[-1]
to get a dim of a tensor.Please try:
T = shape(q, -1) xxx ind = arange(T-1, -1, self.dtype)
I'll have a try. Thank you very much!
@QiJune Seems that it does not work. I got an ind
with shape (0)
, while the correct shape should be (T)
because no matter what number is T, the range is T - 1 - (-1) = T
. I'll appreciate it if you could help me with it. It really have bothered me for a long time.
@AsakusaRinne It seems that arange
does not support -1, you need to set the end value explicitly
@AsakusaRinne It seems that
arange
does not support -1, you need to set the end value explicitly
I also tried start=-1 and end=T-1 last night and had the same result. Does arrange just not support negative number as input?
@AsakusaRinne Yes, the arange does not support negative number
@QiJune I tried ind = arange(concat([0]), T, self.dtype)
but it still seems to not work.
I saw the following error printed:
[TRT] [E] 4: [fillNode.cpp::lowerParams::75] Error Code 4: Internal Error ((Unnamed Layer* 233) [Fill]: LINSPACE requires that input 1 have rank 0)
[TRT] [E] 4: [graphShapeAnalyzer.cpp::needTypeAndDimensions::2235] Error Code 4: Internal Error (RwkvForCausalLM/layers/0/attention/FILL_0: output shape can not be computed)
If I print the shape of ind
, I got (0)
.
Besides I noticed that if I use ws = pow(w, T)
, the result is just the same.
How about ind = arange(0, T, self.dtype)
How about
ind = arange(0, T, self.dtype)
I'll get an assertion error:
File "/home/rinne/TensorRT-LLM/tensorrt_llm/models/rwkv/model.py", line 104, in forward
ind = arange(0, T, self.dtype)
File "/home/rinne/TensorRT-LLM/tensorrt_llm/functional.py", line 1131, in arange
assert isinstance(end, int)
AssertionError
We have a test case for the arange function: https://github.com/NVIDIA/TensorRT-LLM/blob/main/tests/functional/test_arange.py#L70
It should be ind = arange(np.array(0, dtype=np.int32), T, self.dtype)
any update? when will RWKV ready in TRT-LLM?
As more and more new models enter the market, we have prepared comprehensive instructions for TRT-LLM developers on adapting to new models of interest. We encourage our community developers to expand the range of supported models, fostering an open ecosystem with rapid iterations.
Please try following these instructions and let us know if you encounter any issues during the adaptation process. We greatly appreciate your dedication.
Hi do u still have further issue or question now? If not, we'll close it soon.
RWKV is an RNN with Transformer-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable). And it's 100% attention-free. You only need the hidden state at position t to compute the state at position t+1. You can use the "GPT" mode to quickly compute the hidden state for the "RNN" mode.
So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding (using the final hidden state).
Project Homepage: https://github.com/BlinkDL/RWKV-LM
Does TensorRT-LLM support such projects?