[QUESTION] Train Your Own Metric

ecroxford commented 8 months ago

What is your question?

I am trying to train my own metric. However, I continue to run into some version of the same problem (Just with a different validation key sometimes). I do not know if this is a bug, dependency issue, or something else. I followed the instructions exactly as stated in a brand virtual environment for this only. My input yaml file is explained below as well. No other files were changed.

Code

Yaml file: regression_metric: class_path: comet.models.RegressionMetric init_args: nr_frozen_epochs: 0.3 keep_embeddings_frozen: True optimizer: AdamW encoder_learning_rate: 1.0e-06 learning_rate: 1.5e-05 layerwise_decay: 0.95 encoder_model: XLM-RoBERTa pretrained_model: xlm-roberta-large pool: avg layer: mix layer_transformation: sparsemax layer_norm: False loss: mse dropout: 0.1 batch_size: 16 train_data:

data/train_mimic.csv validation_data:
data/validate_mimic.csv hidden_sizes:
2048
1024 activations: Tanh

trainer: ../trainer.yaml early_stopping: ../early_stopping.yaml model_checkpoint: ../model_checkpoint.yaml

Terminal Line: comet-train --cfg configs/models/MIMIC_summ.yaml

Terminal Output: usage: comet-train [-h] [--seed_everything SEED_EVERYTHING] [--cfg CFG] [--print_config[=flags]] [--regression_metric.help CLASS_PATH_OR_NAME] [--regression_metric CONFIG | CLASS_PATH_OR_NAME | .INIT_ARG_NAME VALUE] [--referenceless_regression_metric.help CLASS_PATH_OR_NAME] [--referenceless_regression_metric CONFIG | CLASS_PATH_OR_NAME | .INIT_ARG_NAME VALUE] [--ranking_metric.help CLASS_PATH_OR_NAME] [--ranking_metric CONFIG | CLASS_PATH_OR_NAME | .INIT_ARG_NAME VALUE] [--unified_metric.help CLASS_PATH_OR_NAME] [--unified_metric CONFIG | CLASS_PATH_OR_NAME | .INIT_ARG_NAME VALUE] [--early_stopping.help CLASS_PATH_OR_NAME] [--early_stopping CONFIG | CLASS_PATH_OR_NAME | .INIT_ARG_NAME VALUE] [--model_checkpoint.help CLASS_PATH_OR_NAME] [--model_checkpoint CONFIG | CLASS_PATH_OR_NAME | .INIT_ARG_NAME VALUE] [--trainer.help CLASS_PATH_OR_NAME] [--trainer CONFIG | CLASS_PATH_OR_NAME | .INIT_ARG_NAME VALUE] [--load_from_checkpoint LOAD_FROM_CHECKPOINT] [--strict_load] error: Parser key "trainer": Problem with given class_path 'pytorch_lightning.Trainer': Validation failed: No action for key "use_distributed_sampler" to check its value.

What's your environment?

iOS pip 23.3.1 python 3.11

ricardorei commented 8 months ago

This seems like a problem with your Trainer yaml. This sometimes happens when you are using a different pytorch-lightning version where the trainer class as new init args. Please check what version of lightning you have and if the yaml does not have any argument that is not in the Trainer class

ecroxford commented 8 months ago

HI @ricardorei ,

I was using pytorch-lightening 1.9.5 so I updated it to 2.1.0.

When it is upgraded to 2.1.0 then it results in this error instead:

AttributeError: Can't pickle local object 'CometModel.val_dataloader.locals.lambda'

I saw a similar conversation around this error in issue #159 and checked my torchmetrics package as well and it is 0.10.3 as recommended

It seems like the problem is coming from the fact that I am using a different accelerator. I have tried with cpu and gpu (though it is mps since I am on a M1 chip mac) and gotten the same error each time.

Unbabel / COMET