RWKV / rwkv.cpp

INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
MIT License
1.37k stars 90 forks source link

Phase out very verbose element_count functions #95

Closed LoganDark closed 1 year ago

LoganDark commented 1 year ago

This could have been done better.

I figured out the secret to declaring backwards-compatibility functions that are no longer in the header file. This doesn't work:

RWKV_API extern "C" uint32_t rwkv_get_state_buffer_element_count(const struct rwkv_context * ctx) {

This works:

extern "C" RWKV_API uint32_t rwkv_get_state_buffer_element_count(const struct rwkv_context * ctx) {

So now you know.

For the lulz, I left the python scripts using the old one for now just to show that they work. Let me know if you want those all updated in one go, though I'd prefer to just rework the python scripts completely (I have new bindings that behave a lot better wrt errors).

Anyway, this PR removes rwkv_get_state_buffer_element_count and rwkv_get_logits_buffer_element_count in favor of rwkv_get_state_len and rwkv_get_logits_len respectively. It also adds some new functions rwkv_get_n_vocab, rwkv_get_n_embed, and rwkv_get_n_layer for getting those parameters of the model, for people who are interested, and rwkv_init_state.

Specifically, rwkv_get_n_vocab will assist people automatically picking which tokenizer to use (50277 -> BPE, 65535 -> World), rwkv_get_n_layer will assist people in GPU offloading (figuring out how many layers are on/off GPU), rwkv_get_n_embed will assist people with poking at the RWKV hidden state, since some people want to do that for e.g. embeddings.

There is also a new rwkv_init_state which initializes a state to default / 0 tokens. Useful for people who can't / don't want to track how many times they have called rwkv_eval in order to pass null.

All in all this family of functions seems like a very useful and needed addition, and I don't think there'd be any problems with its inclusion.

saharNooby commented 1 year ago

Let me know if you want those all updated in one go

I'm sorta planning on updating Python part of the libary, specifically to support sequence mode; so I'm okay with Python part not being updated, since it will be updated later anyway.

though I'd prefer to just rework the python scripts completely (I have new bindings that behave a lot better wrt errors)

If you already have good Python wrapper, then maybe it's better for you to contribute it, so we don't do duplicate work :) Do you have such plans?

LoganDark commented 1 year ago

If you already have good Python wrapper, then maybe it's better for you to contribute it, so we don't do duplicate work :) Do you have such plans?

I do actually :3 I also have a much faster world tokenizer than the official one that I'm already using personally

saharNooby commented 1 year ago

I do actually :3

Great! Then I will not touch the Python wrapper (library wrapper and model class) until you update it.