EricLBuehler / mistral.rs

Blazingly fast LLM inference.
MIT License
3.7k stars 264 forks source link

How to use UQFF File locally without sending requests to Hugging Face? #821

Open solaoi opened 1 week ago

solaoi commented 1 week ago

Describe the bug

I'm trying to use UQFF File in a local environment only, but my sample code is still sending requests to Hugging Face. I would like to know how to prevent these external requests and use UQFF File entirely locally.

Sample Code

use std::{env::current_dir, sync::Arc};
use tokio::sync::mpsc::channel;

use mistralrs::{
    Constraint, DefaultSchedulerMethod, Device, DeviceMapMetadata, IsqType, MistralRs,
    MistralRsBuilder, ModelDType, NormalLoaderBuilder, NormalRequest, NormalSpecificConfig,
    Request, RequestMessage, ResponseOk, SamplingParams, SchedulerConfig, TokenSource,
};

fn setup() -> anyhow::Result<Arc<MistralRs>> {
    let path_buf = current_dir()?;
    let loader = NormalLoaderBuilder::new(
        NormalSpecificConfig {
            use_flash_attn: false,
            prompt_batchsize: None,
            topology: None,
            organization: Default::default(),
            write_uqff: None,
            from_uqff: Some(path_buf.join("honyaku13B/Honyaku-13b-q4_0.uqff")),
        },
        Some("honyaku13B/llama2.json".to_string()),
        None,
        Some("aixsatoshi/Honyaku-13b".to_string()),
    )
    .build(None)?;

    let pipeline = loader.load_model_from_hf(
        None,
        TokenSource::None,
        &ModelDType::Auto,
        &Device::new_metal(0)?,
        false,
        DeviceMapMetadata::dummy(),
        Some(IsqType::Q4_0),
        None,
    )?;

    Ok(MistralRsBuilder::new(
        pipeline,
        SchedulerConfig::DefaultScheduler {
            method: DefaultSchedulerMethod::Fixed(5.try_into().unwrap()),
        },
    )
    .build())
}

fn main() -> anyhow::Result<()> {
    let mistralrs = setup()?;

    let (tx, mut rx) = channel(10_000);
    let text = std::env::args()
        .nth(1)
        .unwrap_or_else(|| "Hello world!".to_string());
    let prompt = format!("<english>: {} <NL>\n\n<japanese>: ", text);
    let request = Request::Normal(NormalRequest {
        messages: RequestMessage::Completion {
            text: prompt,
            echo_prompt: false,
            best_of: 1,
        },
        sampling_params: SamplingParams::deterministic(),
        response: tx,
        return_logprobs: false,
        is_streaming: false,
        id: 0,
        constraint: Constraint::None,
        suffix: None,
        adapters: None,
        tools: None,
        tool_choice: None,
        logits_processors: None,
    });
    mistralrs.get_sender()?.blocking_send(request)?;

    let response = rx.blocking_recv().unwrap().as_result().unwrap();
    match response {
        ResponseOk::CompletionDone(c) => println!("Text: {}", c.choices[0].text),
        _ => unreachable!(),
    }
    Ok(())
}

I tried changing the model_id from aixsatoshi/Honyaku-13b to ./honyaku13B/Honyaku-13b-q4_0.uqff, but this resulted in the following error:

File "tokenizer.json" not found at model id "./honyaku13B/Honyaku-13b-q4_0.uqff"

Latest commit or version

https://github.com/EricLBuehler/mistral.rs/commit/329e0e8c5a8403ed50ab829317df79c4823be80a

Oracuda commented 4 days ago

Hey, I'm not working with UQFF so I can't say for sure, but I think this might be possible. Here's the functional command line syntax that I used. I'm also completely new to this project and I haven't worked with it in code yet either, so forgive my naivety. --quantized-filename "my_modelgguf" --quantized-model-id "drive:\mymodels\" so try changing model-id to ./honyaku13B/ or the folder that contains the .uqff

solaoi commented 2 days ago

@Oracuda Thank you for your advice. When I write code similar to gguf, it somehow requires the original config.json, tokenizer.json, and safetensors files.

It seems I'm encountering a problem similar to these issues: https://github.com/EricLBuehler/mistral.rs/issues/828 https://github.com/EricLBuehler/mistral.rs/issues/836

Alternatively, my understanding of UQFF might be incorrect to begin with.