Closed kdcyberdude closed 4 months ago
maybe use the forum instead, and give more details like the yaml content. https://forum.opennmt.net/latest
My inference.yaml config file content -
transforms: [sentencepiece]
src_subword_model: "Starling-LM-7B-alpha-AWQ-onmt/tokenizer.model"
tgt_subword_model: "Starling-LM-7B-alpha-AWQ-onmt/tokenizer.model"
model: "Starling-LM-7B-alpha-AWQ-onmt/Starling-LM-7B-alpha-AWQ-onmt.pt"
seed: 13
max_length: 256
gpu: 0
batch_type: sents
batch_size: 60
world_size: 1
gpu_ranks: [0]
precision: fp16
beam_size: 1
n_best: 1
profile: false
report_time: true
src: None
Added the topic to the forum as well - https://forum.opennmt.net/t/device-side-assert-triggered-on-awq-mistral-converted-model/5656
I have converted the
TheBloke/Starling-LM-7B-alpha-AWQ
model using the following command -python tools/convert_HF.py --model_dir TheBloke/Starling-LM-7B-alpha-AWQ --output ./Starling-LM-7B-alpha-AWQ-onmt/ --format pytorch --nshards 1
And I am not able to run the inference on the converted model. Getting the following error - Command I am using to run -
python translate.py --config ./Starling-LM-7B-alpha-AWQ-onmt/inference.yaml --src ./input_prompt.txt --output ./output.txt
input_prompt.txt content -GPT-4 User: How do you manage stress?<|end_of_turn|>GPT4 Assistant:
And I have one more question - I am not able to understand the example prompts provided for the mistral model - like the tokens used over there i.e.
⦅newline⦆
. I'd appreciate it if you could provide some explanation or documentation link for this.