deepset-ai / FARM

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.
https://farm.deepset.ai
Apache License 2.0
1.74k stars 247 forks source link

SentencePairRegression Multitask Learning #826

Closed TharinduDR closed 2 years ago

TharinduDR commented 3 years ago

Question Thank you for the wonderful repository. I am trying to do multitask learning on two sentence pair regression tasks. I am using the latest FARM version - farm 0.8.0. I am getting a few issues.

My dataset looks like this.

text text_b label_a label_b
how many times have real madrid won the champions league in a row They have also won the competition the most times in a row , winning it five times from 1956 to 1960 . 1 1
when did new york stop using the electric chair Following the U.S. Supreme Court 's ruling declaring existing capital punishment statutes unconstitutional in Furman v. Georgia ( 1972 ) , New York was without a death penalty until 1995 , when then - Governor George Pataki signed a new statute into law , which provided for execution by lethal injection . 1 1
songs on 4 your eyez only j cole `` Neighbors '' Cole 3 : 36 8 . 2 2
how many seasons of the blacklist are there on netflix Retrieved March 27 , 2018 . 0 1

I am using this code to perform multi task learning.

` set_all_seeds(seed=42) device, n_gpu = initialize_device_settings(use_cuda=True) n_epochs = 1 batch_size = 5 evaluate_every = 2 lang_model = "microsoft/MiniLM-L12-H384-uncased"

tokenizer = Tokenizer.load(pretrained_model_name_or_path=lang_model)

register_metrics(name="pearson_corr", implementation=pearson_corr)

processor = TextPairRegressionProcessor(tokenizer=tokenizer,
                                            label_list=None,
                                            max_seq_len=128,
                                            train_filename="sample_1.tsv",
                                            dev_filename="sample_1.tsv",
                                            test_filename=None,
                                            data_dir=Path("samples/text_pair"),
                                            delimiter="\t")

processor.add_task(name="da",
                   metric="pearson_corr",
                   label_column_name="label_a",
                   label_list=[])

processor.add_task(name="hter",
                   metric="pearson_corr",
                   label_column_name="label_b",
                   label_list=[])

data_silo = DataSilo(
    processor=processor,
    batch_size=batch_size)

language_model = LanguageModel.load(lang_model)
prediction_head = RegressionHead()

da_head = RegressionHead(task_name="da")
hter_head = RegressionHead(task_name="hter")

model = AdaptiveModel(
    language_model=language_model,
    prediction_heads=[da_head, hter_head],
    embeds_dropout_prob=0.1,
    lm_output_types=["per_sequence_continuous"],
    device=device)

model, optimizer, lr_schedule = initialize_optimizer(
    model=model,
    learning_rate=5e-5,
    device=device,
    n_batches=len(data_silo.loaders["train"]),
    n_epochs=n_epochs)

trainer = Trainer(
    model=model,
    optimizer=optimizer,
    data_silo=data_silo,
    epochs=n_epochs,
    n_gpu=n_gpu,
    lr_schedule=lr_schedule,
    evaluate_every=evaluate_every,
    device=device)

trainer.train()

save_dir = Path("testsave/text_pair_regression_model")
model.save(save_dir)
processor.save(save_dir)

basic_texts = [
    {"text": ("how many times have real madrid won the champions league in a row", "They have also won the competition the most times in a row, winning it five times from 1956 to 1960")},
    {"text": ("how many seasons of the blacklist are there on netflix", "Retrieved March 27 , 2018 .")},
]

model = Inferencer.load(save_dir)
result = model.inference_from_dicts(dicts=basic_texts)

It gives me the following error;

image I am not sure, why it looks for the 'label' column when I have added tasks specifying label_column_name. However, I amended my dataset and had a fake column named 'label' like below and it seemed to have solved the case.

text text_b label_a label_b label
how many times have real madrid won the champions league in a row They have also won the competition the most times in a row , winning it five times from 1956 to 1960 . 1 1 1
when did new york stop using the electric chair Following the U.S. Supreme Court 's ruling declaring existing capital punishment statutes unconstitutional in Furman v. Georgia ( 1972 ) , New York was without a death penalty until 1995 , when then - Governor George Pataki signed a new statute into law , which provided for execution by lethal injection . 1 1 1
songs on 4 your eyez only j cole `` Neighbors '' Cole 3 : 36 8 . 2 2 2
how many seasons of the blacklist are there on netflix Retrieved March 27 , 2018 . 0 1 1
how many books are in the one piece series The series spans over 800 chapters and more than 80 tankōbon volumes . 1 2 1
central idea of poem lines from the deserted village It is a work of social commentary , and condemns rural depopulation and the pursuit of excessive wealth . 1 1 1
who shot first in the shot heard around the world The North Bridge skirmish did see the first shots by Americans acting under orders , the first organized volley by Americans , the first British fatalities , and the first British retreat . 1 1 1
who is beauty and the beast written by Beauty and the Beast ( French : La Belle et la Bête ) is a traditional fairy tale written by French novelist Gabrielle - Suzanne Barbot de Villeneuve and published in 1740 in La Jeune Américaine et les contes marins ( The Young American and Marine Tales ) . 1 1 1
what episode does eleven come in season 1 Deep South Mag . 2 2 2

Is there any clean way to do this?

Also, with this amended dataset too I got another error.

image

I guess I am getting this because I provided an empty list for label_list when I am adding the task.

processor.add_task(name="hter", metric="pearson_corr", label_column_name="label_b", label_list=[])

I tried removing it or adding None to label_list, but FARM does not let me do it. What should I put for the label_list if I am working on a regression task?

I am sorry if I am overlooking something. Thank you

TharinduDR commented 3 years ago

Any help on this?

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 21 days if no further activity occurs.