NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
7.42k stars 800 forks source link

Any support for RWKV plz? #47

Open Pevernow opened 8 months ago

Pevernow commented 8 months ago

RWKV is an RNN with Transformer-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable). And it's 100% attention-free. You only need the hidden state at position t to compute the state at position t+1. You can use the "GPT" mode to quickly compute the hidden state for the "RNN" mode.

So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding (using the final hidden state).

Project Homepage: https://github.com/BlinkDL/RWKV-LM

Does TensorRT-LLM support such projects?

jdemouth-nvidia commented 8 months ago

Hi @Pevernow , thanks for your message. For the moment, RWKV is not on our roadmap. However, we welcome external contributions and if you are willing to contribute an implementation of RWKV, we could evaluate it and, eventually, merge it into TensorRT-LLM. Would you be interested in contributing?

Pevernow commented 8 months ago

Maybe this is a little difficult for me. But I'll try to find another developer to do it.

AsakusaRinne commented 8 months ago

Hi, I'd like to work on it. Should I open an issue for proposal before starting it?

Pevernow commented 8 months ago

Hi, I'd like to work on it. Should I open an issue for proposal before starting it?

Of course, it depends on your preference. Thank you for your contribution to the community.

AsakusaRinne commented 7 months ago

Hey, I need help in rwkv support in #384 . I would appreciate it if anyone can help me.

In the model forward, ind = arange(T-1, -1, self.dtype) is necessary, where T is a variable depending on the input shape. When building the model, T is deduced as -1. Therefore the building will fail. Any idea to deal with this case? @byshiue @jdemouth-nvidia

QiJune commented 6 months ago

@AsakusaRinne For dynamic shape, you should use shape(x, -1), instead of x.shape[-1] to get a dim of a tensor.

Please try:

T = shape(q, -1)
xxx
ind = arange(T-1, -1, self.dtype)
AsakusaRinne commented 6 months ago

@AsakusaRinne For dynamic shape, you should use shape(x, -1), instead of x.shape[-1] to get a dim of a tensor.

Please try:


T = shape(q, -1)

xxx

ind = arange(T-1, -1, self.dtype)

I'll have a try. Thank you very much!

AsakusaRinne commented 6 months ago

@QiJune Seems that it does not work. I got an ind with shape (0), while the correct shape should be (T) because no matter what number is T, the range is T - 1 - (-1) = T. I'll appreciate it if you could help me with it. It really have bothered me for a long time.

QiJune commented 6 months ago

@AsakusaRinne It seems that arange does not support -1, you need to set the end value explicitly

AsakusaRinne commented 6 months ago

@AsakusaRinne It seems that arange does not support -1, you need to set the end value explicitly

I also tried start=-1 and end=T-1 last night and had the same result. Does arrange just not support negative number as input?

QiJune commented 6 months ago

@AsakusaRinne Yes, the arange does not support negative number

AsakusaRinne commented 6 months ago

@QiJune I tried ind = arange(concat([0]), T, self.dtype) but it still seems to not work.

I saw the following error printed:

[TRT] [E] 4: [fillNode.cpp::lowerParams::75] Error Code 4: Internal Error ((Unnamed Layer* 233) [Fill]: LINSPACE requires that input 1 have rank 0)
[TRT] [E] 4: [graphShapeAnalyzer.cpp::needTypeAndDimensions::2235] Error Code 4: Internal Error (RwkvForCausalLM/layers/0/attention/FILL_0: output shape can not be computed)

If I print the shape of ind, I got (0).

Besides I noticed that if I use ws = pow(w, T), the result is just the same.

QiJune commented 6 months ago

How about ind = arange(0, T, self.dtype)

AsakusaRinne commented 6 months ago

How about ind = arange(0, T, self.dtype)

I'll get an assertion error:

  File "/home/rinne/TensorRT-LLM/tensorrt_llm/models/rwkv/model.py", line 104, in forward
    ind = arange(0, T, self.dtype)
  File "/home/rinne/TensorRT-LLM/tensorrt_llm/functional.py", line 1131, in arange
    assert isinstance(end, int)
AssertionError
QiJune commented 6 months ago

We have a test case for the arange function: https://github.com/NVIDIA/TensorRT-LLM/blob/main/tests/functional/test_arange.py#L70

It should be ind = arange(np.array(0, dtype=np.int32), T, self.dtype)

wujinzhong commented 5 months ago

any update? when will RWKV ready in TRT-LLM?