Unbabel / COMET

A Neural Framework for MT Evaluation
https://unbabel.github.io/COMET/html/index.html
Apache License 2.0
501 stars 78 forks source link

[QUESTION] Train my own Metric: #115

Closed sdlmw closed 1 year ago

sdlmw commented 1 year ago

HI

I downloaded the experiment file and tried to train the model myself. But always get the error below below .
However, I did not find the reason, excuse me, what caused this problem?

Code

ranking_metric:
  class_path: comet.models.RankingMetric
  init_args:
    nr_frozen_epochs: 0.3
    keep_embeddings_frozen: True
    optimizer: AdamW
    encoder_learning_rate: 5.0e-06
    learning_rate: 1.5e-05
    layerwise_decay: 0.95
    encoder_model: XLM-RoBERTa
    pretrained_model: xlm-roberta-base
    pool: avg
    layer: mix
    dropout: 0.1
    batch_size: 4
    train_data: 
      - /MT-work/COMET/data/apequest/train.csv
    validation_data:
      - /MT-work/COMET/data/apequest/test.csv      
trainer: /MT-work/COMET/configs/trainer.yaml
early_stopping: /MT-work/COMET/configs/early_stopping.yaml
model_checkpoint: /MT-work/COMET/configs/model_checkpoint.yaml

comet-train: error: Parser key "ranking_metric": Problem with given class_path "comet.models.RankingMetric":
  - Parser key "train_data": Value "['/MT-work/COMET/data/apequest/train.csv']" does not validate against any of the types in typing.Union[str, NoneType]:
    - Expected a <class 'str'> but got "['/MT-work/COMET/data/apequest/train.csv']"
    - Expected a <class 'NoneType'> but got "['/MT-work/COMET/data/apequest/train.csv']"

What have you tried?

What's your environment?

ricardorei commented 1 year ago

There is a mismatch between unbabel-comet==1.1.3 and the current master branch.

If you are using version 1.1.3 you can't pass a list of training files.. the config is just:

ranking_metric:
  class_path: comet.models.RankingMetric
  init_args:
    nr_frozen_epochs: 0.3
    keep_embeddings_frozen: True
    optimizer: AdamW
    encoder_learning_rate: 5.0e-06
    learning_rate: 1.5e-05
    layerwise_decay: 0.95
    encoder_model: XLM-RoBERTa
    pretrained_model: xlm-roberta-base
    pool: avg
    layer: mix
    dropout: 0.1
    batch_size: 4
    train_data: /MT-work/COMET/data/apequest/train.csv
    validation_data:
      - /MT-work/COMET/data/apequest/test.csv      
trainer: /MT-work/COMET/configs/trainer.yaml
early_stopping: /MT-work/COMET/configs/early_stopping.yaml
model_checkpoint: /MT-work/COMET/configs/model_checkpoint.yaml
sdlmw commented 1 year ago

Hi @ricardorei

Thanks for the explanation.

I just pulled the latest version.

git clone https://github.com/Unbabel/COMET

The error has not changed

ricardorei commented 1 year ago

Hi @sdlmw I just tested the code on master and everything is working fine.

Here is my configs:

ranking_metric:
  class_path: comet.models.RankingMetric
  init_args:
    nr_frozen_epochs: 0.3
    keep_embeddings_frozen: True
    optimizer: AdamW
    encoder_learning_rate: 1.0e-06
    learning_rate: 1.5e-05
    layerwise_decay: 0.95
    encoder_model: XLM-RoBERTa
    pretrained_model: xlm-roberta-base
    pool: avg
    layer: mix
    layer_transformation: sparsemax
    layer_norm: False
    dropout: 0.1
    batch_size: 4
    train_data: 
      - tests/data/ranking_data.csv
    validation_data:
      - tests/data/ranking_data.csv

trainer: ../trainer.yaml
early_stopping: ../early_stopping.yaml
model_checkpoint: ../model_checkpoint.yaml

and for the trainer.yaml:

class_path: pytorch_lightning.trainer.trainer.Trainer
init_args:
  accelerator: gpu
  devices: 1
  accumulate_grad_batches: 4
  amp_backend: native
  amp_level: null
  auto_lr_find: False
  auto_scale_batch_size: False
  auto_select_gpus: False
  benchmark: null
  check_val_every_n_epoch: 1
  default_root_dir: null
  deterministic: False
  fast_dev_run: False
  gradient_clip_val: 1.0
  gradient_clip_algorithm: norm
  limit_train_batches: 1.0
  limit_val_batches: 1.0
  limit_test_batches: 1.0
  limit_predict_batches: 1.0
  log_every_n_steps: 50
  profiler: null
  overfit_batches: 0
  plugins: null
  precision: 16
  max_epochs: 4
  min_epochs: 1
  max_steps: -1
  min_steps: null
  max_time: null
  num_nodes: 1
  num_sanity_val_steps: 10
  reload_dataloaders_every_n_epochs: 0
  replace_sampler_ddp: True
  sync_batchnorm: False
  detect_anomaly: False
  tpu_cores: null
  track_grad_norm: -1
  val_check_interval: 1.0
  enable_model_summary: True
  move_metrics_to_cpu: True
  multiple_trainloader_mode: max_size_cycle
ricardorei commented 1 year ago

note that the data I am using is in the tests folder. Make sure that the data you are using for the ranking model is in the same shape