As a researcher
I need to compile a list of major LLMs currently available
So that we can identify which ones to include in our evaluation program
Details and Assumptions
The list should include LLMs from major providers like OpenAI, Google, and Meta
We need to consider both open-source and proprietary models
The list should include basic information about each LLM's capabilities
Acceptance Criteria
Given I have access to information about current LLMs
When I compile the list of major LLMs
Then the list should include at least 10 different models
And each entry should have the model name, provider, and key features
Capabilities: Strong in text generation, summarization, coding, translation, and multi-turn dialogue. Highly versatile across domains and known for its reasoning and instruction-following capabilities.
Use Cases: Integrated into various applications like chatbots, coding assistants, and content generation tools.
Access: Available via OpenAI's API or platforms like ChatGPT.
Gemini 1.5 (Google DeepMind)
Capabilities: Advanced multimodal abilities, understanding both text and images, with strong reasoning and coding functions. Competitive with GPT-4 in instruction following and question answering.
Use Cases: Ideal for developing AI systems that handle both text and visual inputs.
Access: Access via Google Cloud services.
Claude 3 (Anthropic)
Capabilities: Safety-focused, designed to avoid harmful or biased outputs. It excels in understanding instructions, providing answers in a way that ensures user safety.
Use Cases: Useful in environments needing highly controlled outputs, such as educational tools or customer service.
Access: Available through Anthropic’s API.
Gemma 2 (Google)
Capabilities: Focused on safe and ethical applications, such as preventing misuse for criminal purposes. It has strong performance in text generation and summarization.
Use Cases: AI tools for businesses and developers.
Access: Available via Google AI services.
Open-Source Models:
LLaMA 3 (Meta)
Capabilities: Excellent for creative tasks, coding, logical reasoning, and multilingual performance. It can handle summarization and follow detailed instructions.
Use Cases: Great for developers needing customizable models for research and application building.
Access: Available on Hugging Face, with variations optimized for different tasks.
License: Open but with some restrictions on commercial use.
BLOOM (BigScience/Hugging Face)
Capabilities: Strong in multilingual tasks (supports 46 languages), text generation, summarization, translation, and code generation. It's known for its open-access nature and responsible AI development.
Use Cases: Open research, multilingual content generation, and academic projects.
Access: Fully open-source, available for download and fine-tuning on Hugging Face.
MPT-7B (MosaicML)
Capabilities: Specialized for commercial applications with models like MPT-7B-Instruct (for instructions), MPT-7B-Chat (for dialogue), and MPT-7B-StoryWriter (for long-form writing). It excels in predictive analytics and decision-making tasks.
Use Cases: Businesses needing a customizable model for commercial uses.
Access: Available open-source with commercial licensing.
As a researcher I need to compile a list of major LLMs currently available So that we can identify which ones to include in our evaluation program
Details and Assumptions
Acceptance Criteria