axolotl-ai-cloud / axolotl

Go ahead and axolotl questions
https://axolotl-ai-cloud.github.io/axolotl/
Apache License 2.0
7.57k stars 822 forks source link

Include eval metrics on metadata card #1020

Open ittailup opened 8 months ago

ittailup commented 8 months ago

āš ļø Please check that this feature request hasn't been suggested before.

šŸ”– Feature description

We already have base_model, tags datasets added automatically as metadata on the model card. A few more things are supported:

image

āœ”ļø Solution

We should include eval metrics results as per the example:

# Optional. Add this if you want to encode your eval results in a structured way.
model-index:
- name: {model_id}
  results:
  - task:
      type: {task_type}             # Required. Example: automatic-speech-recognition
      name: {task_name}             # Optional. Example: Speech Recognition
    dataset:
      type: {dataset_type}          # Required. Example: common_voice. Use dataset id from https://hf.co/datasets
      name: {dataset_name}          # Required. A pretty name for the dataset. Example: Common Voice (French)
      config: {dataset_config}      # Optional. The name of the dataset configuration used in `load_dataset()`. Example: fr in `load_dataset("common_voice", "fr")`. See the `datasets` docs for more info: https://huggingface.co/docs/datasets/package_reference/loading_methods#datasets.load_dataset.name
      split: {dataset_split}        # Optional. Example: test
      revision: {dataset_revision}  # Optional. Example: 5503434ddd753f426f4b38109466949a1217c2bb
      args:
        {arg_0}: {value_0}          # Optional. Additional arguments to `load_dataset()`. Example for wikipedia: language: en
        {arg_1}: {value_1}          # Optional. Example for wikipedia: date: 20220301
    metrics:
      - type: {metric_type}         # Required. Example: wer. Use metric id from https://hf.co/metrics
        value: {metric_value}       # Required. Example: 20.90
        name: {metric_name}         # Optional. Example: Test WER
        config: {metric_config}     # Optional. The name of the metric configuration used in `load_metric()`. Example: bleurt-large-512 in `load_metric("bleurt", "bleurt-large-512")`. See the `datasets` docs for more info: https://huggingface.co/docs/datasets/v2.1.0/en/loading#load-configurations
        args:
          {arg_0}: {value_0}        # Optional. The arguments passed during `Metric.compute()`. Example for `bleu`: max_order: 4
        verifyToken: {verify_token} # Optional. If present, this is a signature that can be used to prove that evaluation was generated by Hugging Face (vs. self-reported).
    source:                         # Optional. The source for this result.
      name: {source_name}           # Optional. The name of the source. Example: Open LLM Leaderboard.
      url: {source_url}             # Required if source is provided. A link to the source. Example: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard.
---

ā“ Alternatives

No response

šŸ“ Additional Context

No response

Acknowledgements

lingchensanwen commented 5 months ago

I'm wondering if there's any updates about this