NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.79k stars 1.01k forks source link

Any support for RWKV plz? #47

Open Pevernow opened 1 year ago

Pevernow commented 1 year ago

RWKV is an RNN with Transformer-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable). And it's 100% attention-free. You only need the hidden state at position t to compute the state at position t+1. You can use the "GPT" mode to quickly compute the hidden state for the "RNN" mode.

So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding (using the final hidden state).

Project Homepage: https://github.com/BlinkDL/RWKV-LM

Does TensorRT-LLM support such projects?

jdemouth-nvidia commented 1 year ago

Hi @Pevernow , thanks for your message. For the moment, RWKV is not on our roadmap. However, we welcome external contributions and if you are willing to contribute an implementation of RWKV, we could evaluate it and, eventually, merge it into TensorRT-LLM. Would you be interested in contributing?

Pevernow commented 1 year ago

Maybe this is a little difficult for me. But I'll try to find another developer to do it.

AsakusaRinne commented 1 year ago

Hi, I'd like to work on it. Should I open an issue for proposal before starting it?

Pevernow commented 1 year ago

Hi, I'd like to work on it. Should I open an issue for proposal before starting it?

Of course, it depends on your preference. Thank you for your contribution to the community.

AsakusaRinne commented 12 months ago

Hey, I need help in rwkv support in #384 . I would appreciate it if anyone can help me.

In the model forward, ind = arange(T-1, -1, self.dtype) is necessary, where T is a variable depending on the input shape. When building the model, T is deduced as -1. Therefore the building will fail. Any idea to deal with this case? @byshiue @jdemouth-nvidia

QiJune commented 11 months ago

@AsakusaRinne For dynamic shape, you should use shape(x, -1), instead of x.shape[-1] to get a dim of a tensor.

Please try:

T = shape(q, -1)
xxx
ind = arange(T-1, -1, self.dtype)
AsakusaRinne commented 11 months ago

@AsakusaRinne For dynamic shape, you should use shape(x, -1), instead of x.shape[-1] to get a dim of a tensor.

Please try:


T = shape(q, -1)

xxx

ind = arange(T-1, -1, self.dtype)

I'll have a try. Thank you very much!

AsakusaRinne commented 11 months ago

@QiJune Seems that it does not work. I got an ind with shape (0), while the correct shape should be (T) because no matter what number is T, the range is T - 1 - (-1) = T. I'll appreciate it if you could help me with it. It really have bothered me for a long time.

QiJune commented 11 months ago

@AsakusaRinne It seems that arange does not support -1, you need to set the end value explicitly

AsakusaRinne commented 11 months ago

@AsakusaRinne It seems that arange does not support -1, you need to set the end value explicitly

I also tried start=-1 and end=T-1 last night and had the same result. Does arrange just not support negative number as input?

QiJune commented 11 months ago

@AsakusaRinne Yes, the arange does not support negative number

AsakusaRinne commented 11 months ago

@QiJune I tried ind = arange(concat([0]), T, self.dtype) but it still seems to not work.

I saw the following error printed:

[TRT] [E] 4: [fillNode.cpp::lowerParams::75] Error Code 4: Internal Error ((Unnamed Layer* 233) [Fill]: LINSPACE requires that input 1 have rank 0)
[TRT] [E] 4: [graphShapeAnalyzer.cpp::needTypeAndDimensions::2235] Error Code 4: Internal Error (RwkvForCausalLM/layers/0/attention/FILL_0: output shape can not be computed)

If I print the shape of ind, I got (0).

Besides I noticed that if I use ws = pow(w, T), the result is just the same.

QiJune commented 11 months ago

How about ind = arange(0, T, self.dtype)

AsakusaRinne commented 11 months ago

How about ind = arange(0, T, self.dtype)

I'll get an assertion error:

  File "/home/rinne/TensorRT-LLM/tensorrt_llm/models/rwkv/model.py", line 104, in forward
    ind = arange(0, T, self.dtype)
  File "/home/rinne/TensorRT-LLM/tensorrt_llm/functional.py", line 1131, in arange
    assert isinstance(end, int)
AssertionError
QiJune commented 11 months ago

We have a test case for the arange function: https://github.com/NVIDIA/TensorRT-LLM/blob/main/tests/functional/test_arange.py#L70

It should be ind = arange(np.array(0, dtype=np.int32), T, self.dtype)

wujinzhong commented 10 months ago

any update? when will RWKV ready in TRT-LLM?

AdamzNV commented 4 weeks ago

As more and more new models enter the market, we have prepared comprehensive instructions for TRT-LLM developers on adapting to new models of interest. We encourage our community developers to expand the range of supported models, fostering an open ecosystem with rapid iterations.

Please try following these instructions and let us know if you encounter any issues during the adaptation process. We greatly appreciate your dedication.

nv-guomingz commented 1 week ago

Hi do u still have further issue or question now? If not, we'll close it soon.