allenai / natural-instructions

Expanding natural instructions
https://instructions.apps.allenai.org/
Apache License 2.0
938 stars 187 forks source link

Tasks 1285-1307: kpa dataset #626

Open yeganehkordi opened 2 years ago

yeganehkordi commented 2 years ago

These tasks have the same definitions with different domains. @ashok-arjun, Can you change the definitions? For example, we can have summary generation, argument generation, etc.

mishra-sid commented 2 years ago

Hi, since it requires just updating the task definitions, I can work on this.

yeganehkordi commented 2 years ago

Hi, it's not just the definition; the whole task needs to be changed. These tasks are the same, and we need to change some of them to have new different tasks. It would be great if you could work on this.

danyaljj commented 2 years ago

@yeganehkordi Could you elaborate more on what is the issue?

yeganehkordi commented 2 years ago

Yeah, the kpa dataset consists of arguments and their summary in different topics(gathered from many sources). Each of these tasks is to verify the summary in a given topic, and the only difference in the tasks is their topic. Actually, I think the contributor didn't need to separate the topics into different tasks, because knowing the topic doesn't make any difference in summary verification. My suggestion is to provide new tasks based on this dataset or dropping some of the tasks. The new tasks can be summary generation, argument generation, topic verification, etc.

mishra-sid commented 2 years ago

Makes sense, thanks for the clarification. Maybe I can drop all the existing datasets on kpa, aggregate the instances from those datasets and make the new suggested tasks ?

yeganehkordi commented 2 years ago

I agree! I guess it would be preferable to provide as many different tasks as you can.

danyaljj commented 2 years ago

Yeah, if it's too difficult to fix a task, it's better to drop it completely.

mishra-sid commented 2 years ago

@danyaljj Do you think it's worth spending time creating new tasks using this data(maybe just the simple ones?) or should I just create a PR to drop the existing tasks?

danyaljj commented 2 years ago

Let's drop it.

danyaljj commented 2 years ago

Actually, looking through the conversation, not sure why I suggested to drop the tasks. How about we go ahead with @yeganehkordi 's suggestion of merging the tasks? (if I understood it correctly). @Palipoor do you have any suggestions here?

ashok-arjun commented 2 years ago

Sorry I have been busy over the past few months, and couldn't be active here.

Actually, I think that having the topic (eg. Assisted suicide should be a criminal offence) makes a lot of difference in a specific task, as it covers a part of the question for all the instances. I feel it is similar to having the topic as translate from en-->fr and then having instances under the task.

On the other hand, merging everything would be similar to having all translation pairs (en-->fr, fr-->en and practically all paris) together in one task, and just because the main task (translation) is the same for all of them.

I see that the same notion has been also used in some other tasks, some being:

  1. task1488_sarcasmdetection_headline_classification, task1489_sarcasmdetection_tweet_classification
  2. task1490_bengali_personal_hate_speech_binary_classification, task1491_bengali_political_hate_speech_binary_classification, task1492_bengali_religious_hate_speech_binary_classification, task1493_bengali_geopolitical_hate_speech_binary_classification
  3. task1448_disease_entity_extraction_ncbi_dataset, task1449_disease_entity_extraction_bc5cdr_dataset

In each of the above, the main objective (eg. classification, or entity extraction) remains the same for all tasks, but they are separated based on the characteristics of the instances. I feel that the KPA dataset tasks are also similar, as they are separated based on the characterstics of the instances, and a task as a whole is self-contained about the characteristic, as the question specifies.

Kindly correct me if I am wrong about this. @danyaljj @yeganehkordi

yeganehkordi commented 2 years ago

I think having different topics in summary verification tasks won't make much difference in solving the tasks, and they don't need different information or skills. It just slightly changes the domains. While understanding a language and writing in that language requires different skills; someone might translate a sentence from English to French, and not be able to translate French to English.

task1490_bengali_personal_hate_speech_binary_classification, task1491_bengali_political_hate_speech_binary_classification, task1492_bengali_religious_hate_speech_binary_classification, task1493_bengali_geopolitical_hate_speech_binary_classification

In the same manner, in hate speech detection tasks a sentence might have different types of toxicity, and in each task, we need to distinguish a specific inappropriate language usage. So, the same instance may have different outputs in these tasks.

task1448_disease_entity_extraction_ncbi_dataset and task1449_disease_entity_extraction_bc5cdr_dataset task1488_sarcasmdetection_headline_classification, task1489_sarcasmdetection_tweet_classification

I agree with you on these tasks. They are the same. However, they are from different datasets and I think there is a value in having a large variety of datasets. It's somehow like having the same question answering tasks from different datasets.

I still think it's better to merge instances and have different tasks based on this dataset. If it's not possible, I guess we can evaluate generalization over domains using this version of the tasks and I think we can keep them.

Palipoor commented 2 years ago

I think different translation datasets that are extracted from different sources should be in different tasks. Translating a tweet from English to French is a lot different from translating a Wikipedia page, for instance.
However, in the kpa case, I don't think the tasks are that different from each other. In addition to this, I don't even think it's necessary to include the topics in the task description. Simply "Does the given keypoint correctly summarize the argument?" suffices.

Palipoor commented 2 years ago

I think we can easily merge them into one task. I only dropped them because I thought it was agreed upon.