Closed sdlmw closed 1 year ago
There is a mismatch between unbabel-comet==1.1.3
and the current master branch.
If you are using version 1.1.3 you can't pass a list of training files.. the config is just:
ranking_metric:
class_path: comet.models.RankingMetric
init_args:
nr_frozen_epochs: 0.3
keep_embeddings_frozen: True
optimizer: AdamW
encoder_learning_rate: 5.0e-06
learning_rate: 1.5e-05
layerwise_decay: 0.95
encoder_model: XLM-RoBERTa
pretrained_model: xlm-roberta-base
pool: avg
layer: mix
dropout: 0.1
batch_size: 4
train_data: /MT-work/COMET/data/apequest/train.csv
validation_data:
- /MT-work/COMET/data/apequest/test.csv
trainer: /MT-work/COMET/configs/trainer.yaml
early_stopping: /MT-work/COMET/configs/early_stopping.yaml
model_checkpoint: /MT-work/COMET/configs/model_checkpoint.yaml
Hi @ricardorei
Thanks for the explanation.
I just pulled the latest version.
git clone https://github.com/Unbabel/COMET
The error has not changed
Hi @sdlmw I just tested the code on master and everything is working fine.
Here is my configs:
ranking_metric:
class_path: comet.models.RankingMetric
init_args:
nr_frozen_epochs: 0.3
keep_embeddings_frozen: True
optimizer: AdamW
encoder_learning_rate: 1.0e-06
learning_rate: 1.5e-05
layerwise_decay: 0.95
encoder_model: XLM-RoBERTa
pretrained_model: xlm-roberta-base
pool: avg
layer: mix
layer_transformation: sparsemax
layer_norm: False
dropout: 0.1
batch_size: 4
train_data:
- tests/data/ranking_data.csv
validation_data:
- tests/data/ranking_data.csv
trainer: ../trainer.yaml
early_stopping: ../early_stopping.yaml
model_checkpoint: ../model_checkpoint.yaml
and for the trainer.yaml:
class_path: pytorch_lightning.trainer.trainer.Trainer
init_args:
accelerator: gpu
devices: 1
accumulate_grad_batches: 4
amp_backend: native
amp_level: null
auto_lr_find: False
auto_scale_batch_size: False
auto_select_gpus: False
benchmark: null
check_val_every_n_epoch: 1
default_root_dir: null
deterministic: False
fast_dev_run: False
gradient_clip_val: 1.0
gradient_clip_algorithm: norm
limit_train_batches: 1.0
limit_val_batches: 1.0
limit_test_batches: 1.0
limit_predict_batches: 1.0
log_every_n_steps: 50
profiler: null
overfit_batches: 0
plugins: null
precision: 16
max_epochs: 4
min_epochs: 1
max_steps: -1
min_steps: null
max_time: null
num_nodes: 1
num_sanity_val_steps: 10
reload_dataloaders_every_n_epochs: 0
replace_sampler_ddp: True
sync_batchnorm: False
detect_anomaly: False
tpu_cores: null
track_grad_norm: -1
val_check_interval: 1.0
enable_model_summary: True
move_metrics_to_cpu: True
multiple_trainloader_mode: max_size_cycle
note that the data I am using is in the tests folder. Make sure that the data you are using for the ranking model is in the same shape
HI
I downloaded the experiment file and tried to train the model myself. But always get the error below below .
However, I did not find the reason, excuse me, what caused this problem?
Code
What have you tried?
What's your environment?