Model produces nonsense output. Problem with tokenization?

Hello, I've downloaded a recent RWKV-4-World model from Huggingface. The code runs. However the output is nonsense.

from rwkv.model import RWKV
from rwkv.utils import PIPELINE, PIPELINE_ARGS
import const
model = RWKV(model=str(const.DEFAULT_MODEL_DIR / "RWKV-4-World-3B-v1-20230619-ctx4096"), strategy="cuda fp16")
pipeline = PIPELINE(model, str(const.DEFAULT_MODEL_DIR / "rwkv_20B_tokenizer.json")) # 20B_tokenizer.json is in https://github.com/BlinkDL/ChatRWKV

def my_print(s):
    print(s, end='', flush=True)

args = PIPELINE_ARGS(temperature = 0.1, top_p = 0.5, top_k = 100, # top_k = 0 then ignore
                    alpha_frequency = 0.25,
                    alpha_presence = 0.25,
                    token_ban = [0], # ban the generation of some tokens
                    token_stop = [], # stop generation whenever you see any token here
                    chunk_len = 256) # split input into chunks to save VRAM (shorter -> slower)

ctx = "\nName three colors."
print(ctx, end='')

out = pipeline.generate(ctx, token_count=200, args=args, callback=my_print)
print(out)

Do you know what the problem is?

I've manually played around with the tokenizer and model.forward() but couldn't decode the model's output from the produced tensor.

# this works
encoded = tokenizer.encode(ctx)
print(encoded)
print(encoded.ids)
print(encoded.tokens)
# print(tokenizer.decode(encoded.ids))

# this doesn't work
out, state = model.forward(encoded.ids, state=None)
print(out)
tokenizer.decode(out.detach().cpu().numpy())

On another note, what is your connection to the repo at https://huggingface.co/sgugger/rwkv-7b-pile ?

BlinkDL / ChatRWKV

Model produces nonsense output. Problem with tokenization? #143