Closed seanmcilroy29 closed 5 months ago
Attended
attended
MoM Chris Opens the meeting at 1700 BST
SCER membership. Sean suggests involving academia in the Linux Foundation to increase adoption. Chris and Sean discuss opportunities for University of Michigan students to join the Linux Foundation and gain experience in the field. Sean provides an update on Salesforce, including a problem with a rider agreement.
Standardizing AI model development and deployment. Sean and Chris discuss Hugging face and responsible AI. Chris and Sean discuss the lack of a standardized platform for models, with Chris mentioning the potential for a platform like SCER to fill this gap. Chris explains the importance of standardizing processes for open-source projects.
Standardizing lifecycle model categorization and benchmarking. Chris outlines a standardized framework for categorizing and benchmarking AI models. Chris explains how Hugging Face defines and benchmarks the energy efficiency of large language models. Chris explains the SEER standard for labelling and routing in the context of different model needs.
ISO certification and methodology for ACI implementation. The group discusses the potential for certification and the need for due diligence to ensure compliance with the standard. Chris shares his screen to provide an overview of the approach for the LLM.
Standardizing carbon emissions assessment for large language models. Developing a specification for evaluating the sustainability of large language models. The software categorization in this study focuses on large language models, with categories including model size, application type, and pre-training/fine-tuning. The methodology described in the transcript is applicable across all types of applications in general AI, including text generation, translation, summarization, and more.
Benchmarking large language models for energy efficiency and reproducibility. Chris discusses benchmarks for large language models. Chris discusses the carbon emissions of large language models, using Hugging Face's Carbon Carbon simulation tool. Benchmarking infrastructure includes GPU hardware (e.g., RTX 4090, 180 gigabit) and software (Optimum Touchmark, AI language model evaluation harness).
Evaluating large language models' energy efficiency using six benchmarks. Chris presents a framework for evaluating large language models on multiple tasks, using a range of efficiency metrics. Chris observes a trade-off between energy efficiency and performance in large language models, with larger models typically outperforming smaller ones but consuming more energy. Chris explains how to calculate efficiency scores for AI models using six open-source projects. Developing a rating scale for energy efficiency in hugging face models.
Standardizing language model benchmarking for improved efficiency and accuracy. Researchers propose simplifying large language model benchmarks to aid decision-making. Chris presents a new labeling system for cloud services, similar to Nutri-Score, to help consumers make informed purchasing decisions. The system defines four components: specification, methodology, benchmarking, and categorization, allowing for customization and competitive advantage.
Measuring energy consumption of AI models in a controlled environment. Sean discusses challenges in benchmarking large language models (LLMs) for energy efficiency, including lack of control over hardware and software used in hugging face models. Sean suggests moving to a lab setting to measure energy consumption of LLMs, but notes difficulties in accessing hardware and software details of open-source models. Chris Xie expresses frustration with lack of information on models used in labs. Chris encounters difficulty finding details on models used in experiments, including CSS scores.
Developing a standardized process for benchmarking AI models. Chris highlights a gap in the AI industry's focus on energy efficiency and carbon emissions despite providing information on performance and usage. Chris discusses nonprofit's efforts to make an impact in AI industry despite limited resources. Chris proposes a standardized process for benchmarking AI models.
Standardizing benchmarks for AI models' energy efficiency. Chris highlights the challenges of comparing models across different hardware and software environments. Experts discuss the need for standardized tests to evaluate inference performance across multiple models. Tammy asks why consumers need to know the energy consumption of AI models during different stages. Chris suggests standardizing processes and guidelines to make it easier for others to do lab tests. Tammy raises the question of how to compare the efficiency of different AI models, and Chris suggests using standardized tests to achieve cross-comparative ratings.
Carbon emissions measurement and benchmarking for AI hardware. Chris proposes measuring carbon efficiency by emissions per 1000 tokens and defines a standard workload benchmark for AI hardware.
Standardizing carbon efficiency rating for large language models. Chris explains how to define the efficiency range for carbon output in computing. Chris highlights the limitations of small language models in energy efficiency. Chris and the team aim to standardize language models for more efficient consumption.
Standardizing AI model development and adoption. Sean and Chris discussed merging a base document into a ledger book and circulating it for review. They plan to break down the document into smaller portions and get feedback from the working group. Sean circulated a PR for approval. They also talked about standardizing the development of models in Salesforce, with a focus on readability and informed decision-making.
Standardizing carbon emissions measurement for the AI industry. Tammy and Sean discussed the intersection between the green AI committee and the standard working group. Sean provided feedback from their previous meeting, and Chris helped clarify the misunderstandings. Chris also explained how to read a carbon number and standardize a formula for computing software carbon intensity. Tammy and Chris discussed possibly including other industry standards for benchmark definitions, like SCI.
Standardizing benchmarking for energy efficiency in data centres. Tammy and Chris talked about the difficulties of rating multiple standards and benchmarks for sustainability, with Tammy expressing concerns about consistency and accuracy. Chris explained that the rating system is based on JSF and standards, and they are working on a use case for multiple standards to be applied and consistently rated. They also discussed standardization for workload efficiency in AI development and the challenges of creating a standardized benchmarking process for AI models. They aim to develop a framework for agreeing on a standardized benchmarking process.
Action Items
024.05.01 Agenda/Minutes
Time: Bi-weekly @ 1700 (GMT) - See the time in your timezone
Antitrust Policy
Joint Development Foundation meetings may involve participation by industry competitors, and the Joint Development Foundation intends to conduct all of its activities in accordance with applicable antitrust and competition laws. It is, therefore, extremely important that attendees adhere to meeting agendas and be aware of and not participate in any activities that are prohibited under applicable US state, federal or foreign antitrust and competition laws.
If you have questions about these matters, please contact your company counsel or counsel to the Joint Development Foundation, DLA Piper.
Recordings
WG agreed to record all Meetings. This meeting recording will be available until the next scheduled meeting. Meeting recording link
Roll Call
Please add 'Attended' to this issue during the meeting to denote attendance.
Any untracked attendees will be added by the GSF team below:
Agenda
Introductions
PR Reviews
Project Dashboard
AOB
Future meeting Agenda submissions
Previous meeting AI's
Next Meeting
Adjourn
[ ] Motion to adjourn
Meeting Action Items / Standing Agenda / Future Agenda submissions