Closed vince62s closed 5 months ago
fix ensemble decoding when using flash-attn fix "\n" tokenization (they were stripped out) leading to lower MMLU scores patch to tokenize "\n\n" into"\n" "\n" when using gpt2 BPE adapt phi-2 converter to the new layer names
fix ensemble decoding when using flash-attn fix "\n" tokenization (they were stripped out) leading to lower MMLU scores patch to tokenize "\n\n" into"\n" "\n" when using gpt2 BPE adapt phi-2 converter to the new layer names