Green-Software-Foundation / scer

Software Carbon Efficiency Rating
Other
29 stars 5 forks source link

Proposed Roadmap for SCER #73

Open jawache opened 2 months ago

jawache commented 2 months ago

We need to show several things with the SCER on the road to being published as a specification and endorsed by the GSF.

To put it simply, this means that if there are concerns raised, they must be resolved, we can't ignore.

To get adopted as an ISO standard we need to show:

We need to engineer interactions as much as possible so we can show evidence of organizations being involved. Force communications to happen through GitHub issues for instance. But just generally it's something we need to think about and plan now, this won't get into ISO if just two people wrote the whole spec.

To get adopted by policy makers and governments and regulators we need to show:

We can keep internal conversations informal, but we need to formalize the feedback process as this project is being discussed externally.

Challenges

There have been several meetings with the standards working group and feedback has been provided which indicates this is of great interest, but there isn't consensus with:

Interestingly the area where these seems to be more consensus (at least no one has vocally raised any concerns so far) is in the labeling component (minus ratings of course).

There also has been some feedback that this proposed specification too large to reasonably expect all our organizations to examine and get consensus on, we'll need to break things down into smaller pieces otherwise we risk the default answer being to object, it's safer to say no to something you don't understand.

Proposed Roadmap

It's good to start of with a baseline that everyone agrees with and iterate from there, I propose that we formally split this project into several milestones. Make the first milestone just the concepts that seem to have consensus (labeling) and move the areas which need much more work to get consensus into future milestones. Then we can seriously work on getting consensus for the first milestone and have a path to ISO, whilst still working on future versions with more functionality.

Milestone 1: Disclosure

Scope: Strip out all the rest and just work on a labeling system (minus ratings and categorization). Essentially a mechanism of disclosure, just like the food ingredient labeling system, it's not a statement about how healthy the food is, it's just a disclosure of what's in it.

Milestone 2: Categorization

Scope: Get consensus on the categorization mechanism proposed. Ratings is a function of categorization, if we can't get agreement on how to categorize, there won't be agreement on how to perform ratings.

Milestone 3: Benchmarking

Scope: Get consensus on how to perform benchmarking or if the benchmarking is part of the underlying specification (like SCI). NOTE: This might actually make more sense as a separate OSS project.

Milestone 4: Ratings

Scope: Get consensus on a ratings mechanism. This is going to be the hard one!

chrisxie-fw commented 2 months ago

Thank you @jawache for the very meaningful suggestion! All agreed! We need more contributors! In terms of work, maybe start with the definitions of the 4 steps: Categorization, Benchmarking, Rating, Labelling. Initial attempt to define them are already in the base spec, with concrete examples explaining the definitions. The same thought process is applied in SCER for LLMs, for instance. In software engineering terms, base spec (i.e. SCER) is like a base class, SCER for LLMs, for example, is an implementation of the SCER base class. My point is that while abstractions are being worked on, examples/implementations are illustrated to back up the abstractions. This helps people understand the spec and enables readability.

Using SCER for LLMs as an example, in the context of carbon efficiencies, categorization is primarily based on LLM's size, type, and the spec uses huggingface as an example to illustrate the point:

image

The whole spec is following this thought process. Therefore, I would encourage contributors to use this as a reference and go through the rest of the document and see how to make it through to GSF endorsement and an ISO standard.

chrisxie-fw commented 1 month ago

In order for SCER to get GSF internal consensus, @jawache is suggesting to split SCER spec into 4 mini specs, and to start with a SCER-Labeling spec because labeling (for transparency) is perceived to be the least contentious topic. The visuals/labelling section of the current SCER for LLM spec includes information for:

  1. category
  2. rating (relative value)
  3. gCO2e numbers (absolute values)
  4. QR code that tells how these ratings and numbers come about.

Refer to Section 4 of the SCER for LLMs spec.

The questions are:

For example, when it comes to AI models, if categorization information is missing from the label, people might ask, "What type of AI model does this label refer to?" Is it fair to compare the carbon emissions of large language models with those of small language models? Would comparing apples to oranges be reasonable in these cases?

It looks like @jawache 's suggestion is that the label only includes information for item 3&4 above.