024.05.01 - Githubissues

seanmcilroy29 commented 7 months ago

024.05.01 Agenda/Minutes

Time: Bi-weekly @ 1700 (GMT) - See the time in your timezone

Co-Chair - Marco Valtas (Thoughtsworks)
Co-Chair - Chris Xie- (Futurewei)
Convener – Sean Mcilroy (Linux Foundation)

Antitrust Policy

Joint Development Foundation meetings may involve participation by industry competitors, and the Joint Development Foundation intends to conduct all of its activities in accordance with applicable antitrust and competition laws. It is, therefore, extremely important that attendees adhere to meeting agendas and be aware of and not participate in any activities that are prohibited under applicable US state, federal or foreign antitrust and competition laws.

If you have questions about these matters, please contact your company counsel or counsel to the Joint Development Foundation, DLA Piper.

Recordings

WG agreed to record all Meetings. This meeting recording will be available until the next scheduled meeting. Meeting recording link

Roll Call

Please add 'Attended' to this issue during the meeting to denote attendance.

Any untracked attendees will be added by the GSF team below:

Full Name, Affiliation, (optional) GitHub username

Agenda

[ ] Approve agenda
[ ] Approve previous Meeting Minutes

Introductions

[ ] Any new members?

PR Reviews

[ ] Nothing to review

Project Dashboard

[ ] #42
[ ] #41
[ ] #34
[ ] #33
[ ] #32
[ ] #39
[ ] #25
[ ] #23
[ ] #12

AOB

[ ] Member submissions

Future meeting Agenda submissions

[ ] If you have any Agenda submissions, please let Sean know via email sean@greensoftware.foundation or open an Issue

Previous meeting AI's

[ ] Introduce Tammy to contact Hugging Face to learn more about their carbon footprint initiative for LLMs.
[ ] Contribute thoughts on gaps around LLM energy benchmarking to SCER specification.
[ ] Publish draft SCER for LLM specification to the community for feedback.
[ ] Aim to finalise SCER LLM draft specification by the end of May.
[ ] Add SCER LLM draft specification as an agenda item for the upcoming Standards Working Group meeting. Gather feedback.
[ ] Follow up on getting Salesforce Green Software Foundation membership resolved.

Next Meeting

[ ] 01st May @ 9 am P/T

Adjourn

[ ] Motion to adjourn

Meeting Action Items / Standing Agenda / Future Agenda submissions

[ ] Add here

tmcclell commented 6 months ago

Attended

chrisxie-fw commented 6 months ago

attended

seanmcilroy29 commented 6 months ago

MoM Chris Opens the meeting at 1700 BST

SCER membership. Sean suggests involving academia in the Linux Foundation to increase adoption. Chris and Sean discuss opportunities for University of Michigan students to join the Linux Foundation and gain experience in the field. Sean provides an update on Salesforce, including a problem with a rider agreement.

Standardizing AI model development and deployment. Sean and Chris discuss Hugging face and responsible AI. Chris and Sean discuss the lack of a standardized platform for models, with Chris mentioning the potential for a platform like SCER to fill this gap. Chris explains the importance of standardizing processes for open-source projects.

Standardizing lifecycle model categorization and benchmarking. Chris outlines a standardized framework for categorizing and benchmarking AI models. Chris explains how Hugging Face defines and benchmarks the energy efficiency of large language models. Chris explains the SEER standard for labelling and routing in the context of different model needs.

ISO certification and methodology for ACI implementation. The group discusses the potential for certification and the need for due diligence to ensure compliance with the standard. Chris shares his screen to provide an overview of the approach for the LLM.

Standardizing carbon emissions assessment for large language models. Developing a specification for evaluating the sustainability of large language models. The software categorization in this study focuses on large language models, with categories including model size, application type, and pre-training/fine-tuning. The methodology described in the transcript is applicable across all types of applications in general AI, including text generation, translation, summarization, and more.

Benchmarking large language models for energy efficiency and reproducibility. Chris discusses benchmarks for large language models. Chris discusses the carbon emissions of large language models, using Hugging Face's Carbon Carbon simulation tool. Benchmarking infrastructure includes GPU hardware (e.g., RTX 4090, 180 gigabit) and software (Optimum Touchmark, AI language model evaluation harness).

Evaluating large language models' energy efficiency using six benchmarks. Chris presents a framework for evaluating large language models on multiple tasks, using a range of efficiency metrics. Chris observes a trade-off between energy efficiency and performance in large language models, with larger models typically outperforming smaller ones but consuming more energy. Chris explains how to calculate efficiency scores for AI models using six open-source projects. Developing a rating scale for energy efficiency in hugging face models.

Standardizing language model benchmarking for improved efficiency and accuracy. Researchers propose simplifying large language model benchmarks to aid decision-making. Chris presents a new labeling system for cloud services, similar to Nutri-Score, to help consumers make informed purchasing decisions. The system defines four components: specification, methodology, benchmarking, and categorization, allowing for customization and competitive advantage.

Measuring energy consumption of AI models in a controlled environment. Sean discusses challenges in benchmarking large language models (LLMs) for energy efficiency, including lack of control over hardware and software used in hugging face models. Sean suggests moving to a lab setting to measure energy consumption of LLMs, but notes difficulties in accessing hardware and software details of open-source models. Chris Xie expresses frustration with lack of information on models used in labs. Chris encounters difficulty finding details on models used in experiments, including CSS scores.

Developing a standardized process for benchmarking AI models. Chris highlights a gap in the AI industry's focus on energy efficiency and carbon emissions despite providing information on performance and usage. Chris discusses nonprofit's efforts to make an impact in AI industry despite limited resources. Chris proposes a standardized process for benchmarking AI models.

Standardizing benchmarks for AI models' energy efficiency. Chris highlights the challenges of comparing models across different hardware and software environments. Experts discuss the need for standardized tests to evaluate inference performance across multiple models. Tammy asks why consumers need to know the energy consumption of AI models during different stages. Chris suggests standardizing processes and guidelines to make it easier for others to do lab tests. Tammy raises the question of how to compare the efficiency of different AI models, and Chris suggests using standardized tests to achieve cross-comparative ratings.

Carbon emissions measurement and benchmarking for AI hardware. Chris proposes measuring carbon efficiency by emissions per 1000 tokens and defines a standard workload benchmark for AI hardware.

Standardizing carbon efficiency rating for large language models. Chris explains how to define the efficiency range for carbon output in computing. Chris highlights the limitations of small language models in energy efficiency. Chris and the team aim to standardize language models for more efficient consumption.

Standardizing AI model development and adoption. Sean and Chris discussed merging a base document into a ledger book and circulating it for review. They plan to break down the document into smaller portions and get feedback from the working group. Sean circulated a PR for approval. They also talked about standardizing the development of models in Salesforce, with a focus on readability and informed decision-making.

Standardizing carbon emissions measurement for the AI industry. Tammy and Sean discussed the intersection between the green AI committee and the standard working group. Sean provided feedback from their previous meeting, and Chris helped clarify the misunderstandings. Chris also explained how to read a carbon number and standardize a formula for computing software carbon intensity. Tammy and Chris discussed possibly including other industry standards for benchmark definitions, like SCI.

Standardizing benchmarking for energy efficiency in data centres. Tammy and Chris talked about the difficulties of rating multiple standards and benchmarks for sustainability, with Tammy expressing concerns about consistency and accuracy. Chris explained that the rating system is based on JSF and standards, and they are working on a use case for multiple standards to be applied and consistently rated. They also discussed standardization for workload efficiency in AI development and the challenges of creating a standardized benchmarking process for AI models. They aim to develop a framework for agreeing on a standardized benchmarking process.

Action Items

[ ] Circulate the PR for the SCER standard specification for approval.
[ ] Reach out to University of Michigan contact about joining GSF.
[ ] Invite Jay Wong from the University of Michigan to present at the next meeting.
[ ] Investigate the feasibility of standardizing inference benchmarks across models
[ ] Provide examples of how comparability could be achieved through a standardized test approach
[ ] Continue discussion on open questions around cross-model comparability at a future meeting

Green-Software-Foundation / scer

024.05.01 #45