deepset-ai / FARM

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.
https://farm.deepset.ai
Apache License 2.0
1.73k stars 247 forks source link

Multi Task Learning implementation and example #778

Closed johann-petrak closed 3 years ago

johann-petrak commented 3 years ago

This fixes #408 for me, did some testing with two classification heads so far and most everything works fine: I am getting proper evaluations for each head during training and the inferencer provides a list of two predictions for each instance.

Still to check:

Help with these things to check and with how to best test this in general would be appreciated.

Timoeller edit: fixes #742

Timoeller commented 3 years ago

Incredible. You are doing a little sweep through all of FARM and it is much cleaner than before : ) Thanks a lot @johann-petrak

tholor commented 3 years ago

I had a quick look at your code and I think your changes make sense. Haven't tried if they are breaking anything around NaturalQuestions, but the NQ inference part was broken already before and we will need to refactor this anyways.

why do we not get the proper task name for the predictions from the inference?

I guess you mean "task":"text_classification" in the dict returned by the inferencer? This is set in the head itself here and is rather reflecting the task type: https://github.com/deepset-ai/FARM/blob/master/farm/modeling/prediction_head.py#L413

I believe the order of the predictions should reflect the order of your prediction heads. However, I can totally see that for multiple classification heads (especially with very similar predictions) you want a clear identification in the results. We could probably add a field "task_name" to the dict at the above location in the code by reading from the prediction head's self.task_name.

johann-petrak commented 3 years ago

I believe the order of the predictions should reflect the order of your prediction heads. However, I can totally see that for multiple classification heads (especially with very similar predictions) you want a clear identification in the results. We could probably add a field "task_name" to the dict at the above location in the code by reading from the prediction head's self.task_name.

My main concern is that it is a bit confusing how the keys/parameter names "task", "task_type" and "task_name" are used, ideally it would always be according to the convention that "task" is a task object, "task_name" is the name of a task, and "task_type" is the type, but this is not always the case. (I know I can be a bit too OCD about such things)

Of course hard to change now as it would break backwards compatibility, so I have no opinion :)

Adding a field "task_name" in the prediction would definitely help to immediately realize that "task" is not meant to be the task name. Should we make this part of this PR or have a different issue for it?

Timoeller commented 3 years ago

Hey @johann-petrak I agree that the task setup could be improved - inconsistent use of naming is very confusing to users (and also us, looking at code we wrote a year ago and just shaking heads), so we also appreciate rigorous improvement on this end.

In general tasks is a nested dict looking like:

self.tasks[name] = {
            "label_list": label_list,
            "metric": metric,
            "label_tensor_name": label_tensor_name,
            "label_name": label_name,
            "label_column_name": label_column_name,
            "text_column_name": text_column_name,
            "task_type": task_type
        }

and I like the suggestion to add the tasks name as "task_name" to the prediction heads output. People might want to have n classification heads that would be only distinguishable through this task_name.

So lets add this "task_name" in this PR and improve upon the general naming of tasks in a separate PR?

Timoeller commented 3 years ago

Hey @johann-petrak any progress on the task_name field?

I think the PR is already in good shape and could be merged soon.

johann-petrak commented 3 years ago

Hey @johann-petrak any progress on the task_name field?

I think the PR is already in good shape and could be merged soon.

Sorry, did not forget about this, I have been crazy busy with other stuff and planning to get back this this in the next days.

johann-petrak commented 3 years ago

OK, this is now now ready from my side, the prediction heads for text classifcation, regression and a few others now include the key "task_name" in the formatted prediction result. The key "task" contains the type as before. There is also an example script to illustrate using two text classification heads.

This did not address or test using multiple heads where the heads are of different type, e.g. classification and NER. Probably better to open or re-use a separate issue for that, as it will need some good dataset for testing as well.

johann-petrak commented 3 years ago

Sorry, that was my insanity, that file should not have been added, thanks for spotting! Removed now.

julian-risch commented 3 years ago

Had some problems with executing the benchmark but now the results are here: https://github.com/deepset-ai/FARM/pull/797#commitcomment-51938273 -> Ready to merge

Benchmarks: QA Accuracy

:heavy_check_mark: QA Accuracy Benchmark Passed

run f1_change em_change tnacc_change elapsed_change f1 em tnacc elapsed f1_gold em_gold tnacc_gold elapsed_gold
0 FARM internal evaluation 0.017 0.0001 0.0169 -0.5025 82.6841 78.4722 84.3763 39.4975 82.6671 78.4721 84.3594 40
1 outside eval script -0.0045 -0.0077 - 0.0436 82.9125 79.8703 - 27.0436 82.917 79.878 - 27
2 train evaluation 0.4809 -0.2106 0.1685 -20.0116 82.6359 78.4469 97.5406 1114.99 82.155 78.6575 97.3721 1135

CML watermark