UbiquitousLearning / mllm

Fast Multimodal LLM on Mobile Devices
https://ubiquitouslearning.github.io/mllm_website
MIT License
499 stars 57 forks source link

feat: topk/topp sampling #105

Closed chenghuaWang closed 3 months ago

chenghuaWang commented 3 months ago

greedy search, topk sampling and topp sampling for language generation. see ref: https://huggingface.co/blog/how-to-generate

Note: The tensor provided to the top-p generator should sum to 1, indicating that a softmax operation should be applied first.

LlmTextGenerator gen(LLmTextGeneratorType::kTopkSampling, /*k*/ 50, /*temperature*/0.3, /*p*/0.92);
auto result = model(...);
auto out_token = gen.generate(result[0]);
auto out_string = tokenizer.detokenize({out_token});
chenghuaWang commented 3 months ago

To avoid copying the entire vector, if you want to get all tokens by once, pls using call_back function. Here is an example

Chat:

for (int i = 0; i < in_strs.size(); ++i) {
        auto in_str = in_strs[i];
        auto input_tensor = tokenizer.tokenize(in_str, i);
        std::cout << "[Q] " << in_str << std::endl;
        std::cout << "[A] " << std::flush;

        LlmTextGeneratorOpts opt{
            .max_new_tokens = 100,
            .do_sample = true,
            .temperature = 0.3f,
            .top_k = 50,
            .top_p = 0.f,
        };
        model.generate(input_tensor, opt, [&](unsigned int out_token) -> bool {
            auto out_string = tokenizer.detokenize({out_token});
            auto [isOk, print_string] = processOutput(out_string);
            if (isOk) {
                std::cout << print_string << std::flush;
            } else {
                return false;
            }
            return true;
        });
        printf("\n");
    }

Get all Tokens:

for (int i = 0; i < in_strs.size(); ++i) {
        auto in_str = in_strs[i];
        auto input_tensor = tokenizer.tokenize(in_str, i);

        LlmTextGeneratorOpts opt{
            .max_new_tokens = 100,
            .do_sample = true,
            .temperature = 0.3f,
            .top_k = 50,
            .top_p = 0.f,
        };
        std::vector<unsigned int> tokens
        model.generate(input_tensor, opt, [&](unsigned int out_token) -> bool {
            tokens.emplace_back(out_token);
            return true;
        });
        auto out_string = tokenizer.detokenize(out_token);
    }