ThilinaRajapakse / simpletransformers

Transformers for Information Retrieval, Text Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conversational AI
https://simpletransformers.ai/
Apache License 2.0
4.07k stars 726 forks source link

Binary Classification failing with microsoft/deberta-v3-base #1476

Open vitthal-bhandari opened 1 year ago

vitthal-bhandari commented 1 year ago

Describe the bug Binary label classification with simple transformers fails with microsoft/deberta-v3-base using deberta as model code. Failure occurs during fine-tuning.

RuntimeError: Error(s) in loading state_dict for DebertaForSequenceClassification: size mismatch for deberta.encoder.rel_embeddings.weight: copying a param with shape torch.Size([512, 768]) from checkpoint, the shape in current model is torch.Size([1024, 768]).

To Reproduce

train_args = {
    "reprocess_input_data": True,
    "output_dir": "output_14k",
    "overwrite_output_dir": True,
    "manual_seed": 42,
    "use_multiprocessing": False,
    "use_multiprocessing_for_evaluation": False,
    "train_batch_size": 16,
    "labels_list": ["not sexist", "sexist"],
    "learning_rate": 2e-5,
    "num_train_epochs": 2,
    }

model = ClassificationModel(
    "deberta", 
    "microsoft/deberta-v3-base",
    use_cuda=True,
    args=train_args,
)

model.train_model(
    dataset, 
)

Expected behavior Fine-tuning should happen correctly

Screenshots image

Desktop:

GeorgiPachov commented 1 year ago

Having the same issue with Deberta-v3-large. Seems like simpletransformers are not able to work with -v3 of deberta, which is a bummer. Seems like something small (configuration ,etc) but don't know how to fix it.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

edanhauon-openweb commented 10 months ago

Is there a plan to add this to the main repo?

ziqizhang commented 10 months ago

Please address this. I have the same problem. It looks like deberta-v3 is not currently supported:

You are using a model of type deberta-v2 to instantiate a model of type deberta. This is not supported for all configurations of models and can yield errors.