BBC-Esq / VectorDB-Plugin-for-LM-Studio

Plugin that lets you ask questions about your documents including audio and video files.
https://www.youtube.com/@AI_For_Lawyers
284 stars 35 forks source link

possibly incorporate new minitron chat models #271

Closed BBC-Esq closed 2 months ago

BBC-Esq commented 2 months ago

https://huggingface.co/nvidia/Llama-3.1-Minitron-4B-Width-Base https://huggingface.co/nvidia/Llama-3.1-Minitron-4B-Depth-Base https://huggingface.co/nvidia/Mistral-NeMo-Minitron-8B-Base

Model Comparison: Llama-3.1 Minitron and Mistral-NeMo Minitron

Feature Llama-3.1-Minitron-4B-Depth-Base Llama-3.1-Minitron-4B-Width-Base Mistral-NeMo-Minitron-8B-Base
Base Model Llama-3.1-8B Llama-3.1-8B Mistral-NeMo 12B
Pruning Method Reduced transformer blocks Reduced embedding size and MLP dimension Reduced embedding and MLP dimension
Parameters 4.54B 4.51B 8.41B
Layers 32 32 40
Embedding Size 4096 3072 4096
MLP Intermediate Dimension 14336 9216 11520
Attention Heads 32 32 32
Training Period July 29-Aug 3, 2024 July 29-Aug 3, 2024 July 24-Aug 10, 2024

Performance Metrics

Metric Llama-3.1-Minitron-4B-Depth-Base Llama-3.1-Minitron-4B-Width-Base Mistral-NeMo-Minitron-8B-Base
MMLU (5-shot) 58.7 60.5 69.5
GSM8K (zero-shot) 16.8 41.2 58.5

Common Features

Key Differences

BBC-Esq commented 2 months ago

Not currently viable - tensor shape mismatch. Will re-create when models are corrected.