CycloneDX / specification

OWASP CycloneDX is a full-stack Bill of Materials (BOM) standard that provides advanced supply chain capabilities for cyber risk reduction. SBOM, SaaSBOM, HBOM, AI/ML-BOM, CBOM, OBOM, MBOM, VDR, and VEX
https://cyclonedx.org/
Apache License 2.0
361 stars 57 forks source link

Propose new environmental consideration information for ML models #396

Closed mrutkows closed 6 months ago

mrutkows commented 6 months ago

see https://github.com/CycloneDX/specification/issues/396#issuecomment-1992596992

As a AI producer or operator, I want the ability to represent environmental concerns including energy consumption and CO2 emissions throughout the lifecycle of a model, including data acquisition, training and fine-tuning, to MLOps (including inference). I want to use CycloneDX to help my organization comply with the environmental transparency requirements in the AI Act.


The fact that datasets used to train AI models are increasingly large and take an enormous amount of energy (and indirectly produce large CO2 emissions) to develop, train and run has come to the forefront. This PR contains proposed additions to the "modelCard" type to account for these considerations when selecting/utilizing a model.

Background:

many more from any search engine...

jkowalleck commented 6 months ago

i dont understand the issue

This PR contains proposed additions to the "modelCard" type to account for these considerations when selecting/utilizing a model.

this description does in no way describe the actual problem, but gives a reason why a certain problem shall be solved

jkowalleck commented 6 months ago

@stevespringett can you help me here? I dont see a reason for putting these values in an ML-BOM. Putting self-proclaimed side-data in BOM - does this actually help anybody? Is there some write-up or video-recording from the CycloneDX ML-WorkingGroup related to this topic? With my current understanding of the topic, this all looks like an abuse of BOM for bragging-purposes (like: look how large/low my numbers are... and my numbers are better than yours...).

stevespringett commented 6 months ago

@jkowalleck The energy crisis for AI was just starting to happen when the AI/ML workgroup was operational. Over the last year, the crisis has grown exponentially. Organizations previously were talking about being carbon neutral. With the energy demands of AI, that likely is not possible. This reality is captured in the text of the AI Act. The energy considerations can also be combined with CDXA so that organizations can attest to the data in the model card.

The environment consideration support that Matt is working on will help CycloneDX adopters meet requirements in the AI Act.

According to the text adopted by the European Parliament, the AI Act sets out requirements for so-called "high-risk AI systems." These systems must be designed and developed with logging capabilities that enable the recording of energy consumption, the measurement or calculation of resource use, and the environmental impact throughout the system's lifecycle. These requirements primarily focus on transparency, ensuring that stakeholders have access to data on energy consumption. However, it is important to note that, in this case, the AI Act does not compel measures to reduce the energy consumption of AI systems.

Source: https://www.techpolicy.press/addressing-ai-energy-consumption-why-the-eu-must-embrace-ecodesign-for-software/

This is the use case that Matt is trying to achieve with this feature.

stevespringett commented 6 months ago

To frame this in a use case:

As a AI producer or operator, I want the ability to represent environmental concerns including energy consumption and CO2 emissions throughout the lifecycle of a model, including data acquisition, training and fine-tuning, to MLOps (including inference). I want to use CycloneDX to help my organization comply with the environmental transparency requirements in the AI Act.

jkowalleck commented 6 months ago

Environmental costs for ML-BOM is just one aspect. Would you also add cost for SaaSBOM - how much does it cost to run the service? Would you also add time cost for SBOM - like how many hours went into the development of a component? Would you also add health/medical costs for HBOM - how many people suffered for mining the materials used in a component?

Thing is, all these "costs" are currently (in real world) priced in money (taxes, operational costs, RnD, etc). If we wanted to add environmental costs specifically, then I would argue that we should add costs in general - for every component/service/...

stevespringett commented 6 months ago

If we wanted to add environmental costs specifically, then I would argue that we should add costs in general - for every component/service/...

Valid point. However, the same logic could be applied to the majority of the model card, including performance metrics and biases. But that's not where the industry is currently at. But in the proposed design, we could reuse this data outside of just the model card in a generic sense and make it available to every component and service.

jkowalleck commented 6 months ago

But in the proposed design, we could reuse this data outside of just the model card in a generic sense and make it available to every component and service.

that sounds good. finding a generalized solution that can be reused :+1:

PS: here are others asking for a generic approach

jkowalleck commented 6 months ago

Existing work/art in the field : Green Software Foundation - Impact Framework - see https://if.greensoftware.foundation/

jkowalleck commented 6 months ago

a followp will be https://github.com/CycloneDX/specification/issues/406