Open ecroxford opened 8 months ago
This seems like a problem with your Trainer yaml. This sometimes happens when you are using a different pytorch-lightning version where the trainer class as new init args. Please check what version of lightning you have and if the yaml does not have any argument that is not in the Trainer class
HI @ricardorei ,
I was using pytorch-lightening 1.9.5 so I updated it to 2.1.0.
When it is upgraded to 2.1.0 then it results in this error instead:
AttributeError: Can't pickle local object 'CometModel.val_dataloader.locals.lambda'
I saw a similar conversation around this error in issue #159 and checked my torchmetrics package as well and it is 0.10.3 as recommended
It seems like the problem is coming from the fact that I am using a different accelerator. I have tried with cpu and gpu (though it is mps since I am on a M1 chip mac) and gotten the same error each time.
What is your question?
I am trying to train my own metric. However, I continue to run into some version of the same problem (Just with a different validation key sometimes). I do not know if this is a bug, dependency issue, or something else. I followed the instructions exactly as stated in a brand virtual environment for this only. My input yaml file is explained below as well. No other files were changed.
Code
Yaml file: regression_metric: class_path: comet.models.RegressionMetric init_args: nr_frozen_epochs: 0.3 keep_embeddings_frozen: True optimizer: AdamW encoder_learning_rate: 1.0e-06 learning_rate: 1.5e-05 layerwise_decay: 0.95 encoder_model: XLM-RoBERTa pretrained_model: xlm-roberta-large pool: avg layer: mix layer_transformation: sparsemax layer_norm: False loss: mse dropout: 0.1 batch_size: 16 train_data:
trainer: ../trainer.yaml early_stopping: ../early_stopping.yaml model_checkpoint: ../model_checkpoint.yaml
Terminal Line: comet-train --cfg configs/models/MIMIC_summ.yaml
Terminal Output: usage: comet-train [-h] [--seed_everything SEED_EVERYTHING] [--cfg CFG] [--print_config[=flags]] [--regression_metric.help CLASS_PATH_OR_NAME] [--regression_metric CONFIG | CLASS_PATH_OR_NAME | .INIT_ARG_NAME VALUE] [--referenceless_regression_metric.help CLASS_PATH_OR_NAME] [--referenceless_regression_metric CONFIG | CLASS_PATH_OR_NAME | .INIT_ARG_NAME VALUE] [--ranking_metric.help CLASS_PATH_OR_NAME] [--ranking_metric CONFIG | CLASS_PATH_OR_NAME | .INIT_ARG_NAME VALUE] [--unified_metric.help CLASS_PATH_OR_NAME] [--unified_metric CONFIG | CLASS_PATH_OR_NAME | .INIT_ARG_NAME VALUE] [--early_stopping.help CLASS_PATH_OR_NAME] [--early_stopping CONFIG | CLASS_PATH_OR_NAME | .INIT_ARG_NAME VALUE] [--model_checkpoint.help CLASS_PATH_OR_NAME] [--model_checkpoint CONFIG | CLASS_PATH_OR_NAME | .INIT_ARG_NAME VALUE] [--trainer.help CLASS_PATH_OR_NAME] [--trainer CONFIG | CLASS_PATH_OR_NAME | .INIT_ARG_NAME VALUE] [--load_from_checkpoint LOAD_FROM_CHECKPOINT] [--strict_load] error: Parser key "trainer": Problem with given class_path 'pytorch_lightning.Trainer': Validation failed: No action for key "use_distributed_sampler" to check its value.
What's your environment?
iOS pip 23.3.1 python 3.11