Question
Thank you for the wonderful repository. I am trying to do multitask learning on two sentence pair regression tasks. I am using the latest FARM version - farm 0.8.0. I am getting a few issues.
My dataset looks like this.
text
text_b
label_a
label_b
how many times have real madrid won the champions league in a row
They have also won the competition the most times in a row , winning it five times from 1956 to 1960 .
1
1
when did new york stop using the electric chair
Following the U.S. Supreme Court 's ruling declaring existing capital punishment statutes unconstitutional in Furman v. Georgia ( 1972 ) , New York was without a death penalty until 1995 , when then - Governor George Pataki signed a new statute into law , which provided for execution by lethal injection .
1
1
songs on 4 your eyez only j cole
`` Neighbors '' Cole 3 : 36 8 .
2
2
how many seasons of the blacklist are there on netflix
Retrieved March 27 , 2018 .
0
1
I am using this code to perform multi task learning.
tokenizer = Tokenizer.load(pretrained_model_name_or_path=lang_model)
register_metrics(name="pearson_corr", implementation=pearson_corr)
processor = TextPairRegressionProcessor(tokenizer=tokenizer,
label_list=None,
max_seq_len=128,
train_filename="sample_1.tsv",
dev_filename="sample_1.tsv",
test_filename=None,
data_dir=Path("samples/text_pair"),
delimiter="\t")
processor.add_task(name="da",
metric="pearson_corr",
label_column_name="label_a",
label_list=[])
processor.add_task(name="hter",
metric="pearson_corr",
label_column_name="label_b",
label_list=[])
data_silo = DataSilo(
processor=processor,
batch_size=batch_size)
language_model = LanguageModel.load(lang_model)
prediction_head = RegressionHead()
da_head = RegressionHead(task_name="da")
hter_head = RegressionHead(task_name="hter")
model = AdaptiveModel(
language_model=language_model,
prediction_heads=[da_head, hter_head],
embeds_dropout_prob=0.1,
lm_output_types=["per_sequence_continuous"],
device=device)
model, optimizer, lr_schedule = initialize_optimizer(
model=model,
learning_rate=5e-5,
device=device,
n_batches=len(data_silo.loaders["train"]),
n_epochs=n_epochs)
trainer = Trainer(
model=model,
optimizer=optimizer,
data_silo=data_silo,
epochs=n_epochs,
n_gpu=n_gpu,
lr_schedule=lr_schedule,
evaluate_every=evaluate_every,
device=device)
trainer.train()
save_dir = Path("testsave/text_pair_regression_model")
model.save(save_dir)
processor.save(save_dir)
basic_texts = [
{"text": ("how many times have real madrid won the champions league in a row", "They have also won the competition the most times in a row, winning it five times from 1956 to 1960")},
{"text": ("how many seasons of the blacklist are there on netflix", "Retrieved March 27 , 2018 .")},
]
model = Inferencer.load(save_dir)
result = model.inference_from_dicts(dicts=basic_texts)
It gives me the following error;
I am not sure, why it looks for the 'label' column when I have added tasks specifying label_column_name. However, I amended my dataset and had a fake column named 'label' like below and it seemed to have solved the case.
text
text_b
label_a
label_b
label
how many times have real madrid won the champions league in a row
They have also won the competition the most times in a row , winning it five times from 1956 to 1960 .
1
1
1
when did new york stop using the electric chair
Following the U.S. Supreme Court 's ruling declaring existing capital punishment statutes unconstitutional in Furman v. Georgia ( 1972 ) , New York was without a death penalty until 1995 , when then - Governor George Pataki signed a new statute into law , which provided for execution by lethal injection .
1
1
1
songs on 4 your eyez only j cole
`` Neighbors '' Cole 3 : 36 8 .
2
2
2
how many seasons of the blacklist are there on netflix
Retrieved March 27 , 2018 .
0
1
1
how many books are in the one piece series
The series spans over 800 chapters and more than 80 tankōbon volumes .
1
2
1
central idea of poem lines from the deserted village
It is a work of social commentary , and condemns rural depopulation and the pursuit of excessive wealth .
1
1
1
who shot first in the shot heard around the world
The North Bridge skirmish did see the first shots by Americans acting under orders , the first organized volley by Americans , the first British fatalities , and the first British retreat .
1
1
1
who is beauty and the beast written by
Beauty and the Beast ( French : La Belle et la Bête ) is a traditional fairy tale written by French novelist Gabrielle - Suzanne Barbot de Villeneuve and published in 1740 in La Jeune Américaine et les contes marins ( The Young American and Marine Tales ) .
1
1
1
what episode does eleven come in season 1
Deep South Mag .
2
2
2
Is there any clean way to do this?
Also, with this amended dataset too I got another error.
I guess I am getting this because I provided an empty list for label_list when I am adding the task.
I tried removing it or adding None to label_list, but FARM does not let me do it. What should I put for the label_list if I am working on a regression task?
I am sorry if I am overlooking something. Thank you
Question Thank you for the wonderful repository. I am trying to do multitask learning on two sentence pair regression tasks. I am using the latest FARM version - farm 0.8.0. I am getting a few issues.
My dataset looks like this.
I am using this code to perform multi task learning.
` set_all_seeds(seed=42) device, n_gpu = initialize_device_settings(use_cuda=True) n_epochs = 1 batch_size = 5 evaluate_every = 2 lang_model = "microsoft/MiniLM-L12-H384-uncased"
It gives me the following error;
I am not sure, why it looks for the 'label' column when I have added tasks specifying label_column_name. However, I amended my dataset and had a fake column named 'label' like below and it seemed to have solved the case.
Is there any clean way to do this?
Also, with this amended dataset too I got another error.
I guess I am getting this because I provided an empty list for label_list when I am adding the task.
processor.add_task(name="hter", metric="pearson_corr", label_column_name="label_b", label_list=[])
I tried removing it or adding None to label_list, but FARM does not let me do it. What should I put for the label_list if I am working on a regression task?
I am sorry if I am overlooking something. Thank you