allenai / natural-instructions

Expanding natural instructions
https://instructions.apps.allenai.org/
Apache License 2.0
958 stars 189 forks source link

ATOMIC #217

Closed danyaljj closed 2 years ago

danyaljj commented 3 years ago

One can create tasks based on the atomic data, where the task definitions define the edges/relations in their dataset: https://allenai.org/data/atomic-2020

swarooprm commented 3 years ago

This one is huge and has a lot of varieties, many tasks can be done using this. Same is true for Conceptnet.

yeganehkordi commented 3 years ago

I looked at this dataset. I thought of defining the edges/relations task and several classification tasks based on the meaning (Physical-Entity Commonsense, Social-Interaction Commonsense, and Event-Centered Commonsense) and the relations. I might be able to define 27 tasks based on this dataset. How many tasks and which tasks are desirable?

swarooprm commented 3 years ago

You are right! Different relations produce different tasks. Re: How many tasks: No such estimate, more the better :)

ashok-arjun commented 3 years ago

@yeganehkordi If you're still working on this and need help, I can collaborate and take up some tasks

yeganehkordi commented 3 years ago

@ashok-arjun Thanks! I haven't started working on the classification tasks based on the meaning (Physical-Entity Commonsense, Social-Interaction Commonsense, and Event-Centered Commonsense). Feel free to take them.

ashok-arjun commented 3 years ago

Can you explain what you mean by classification tasks? @yeganehkordi Because all I see in the dataset are such tasks?

Does this also mean you've also worked on some tasks from the dataset?

ashok-arjun commented 3 years ago

@swarooprm @danyaljj Would physical entity tasks be defined in the following way be acceptable? I'm not sure since the input isn't a sentence

Task: Predict what the entity is made up of

Input: Bread Output: Dough, wheat

danyaljj commented 3 years ago

@swarooprm @danyaljj Would physical entity tasks be defined in the following way be acceptable? I'm not sure since the input isn't a sentence

Task: Predict what the entity is made up of

Input: Bread Output: Dough, wheat

Yeah I think that should be good!

danyaljj commented 3 years ago

Just have this in mind that: small high-quality data is better than large low-quality data.

ashok-arjun commented 3 years ago

Sure - I'll hand-pick about 15-20 required tasks and work on them.

yeganehkordi commented 3 years ago

Can you explain what you mean by classification tasks? @yeganehkordi Because all I see in the dataset are such tasks?

Does this also mean you've also worked on some tasks from the dataset?

Yes, I created binary classification tasks based on the relations. For example: determine if given sentences have oEffect relation. Also, I have created one task that, given the edges, asks the relation between them. Besides them, you can create classification tasks based on the meaning. For example, if two edges have a Social-Interaction Commonsense relation or not.

danyaljj commented 3 years ago

Yes, I created binary classification tasks based on the relations. For example: determine if given sentences have oEffect relation.

Are these tasks merged in? Don't remember seeing them ...

yeganehkordi commented 3 years ago

Are these tasks merged in? Don't remember seeing them ...

No, Actually I was a bit busy. I'll create a pull request soon.

danyaljj commented 3 years ago

Got it, take your time!

ashok-arjun commented 3 years ago

Are these tasks merged in? Don't remember seeing them ...

No, Actually I was a bit busy. I'll create a pull request soon.

@yeganehkordi I just finished working on +ve and -ve examples and instances for about 20 tasks of this manner. Could you please let me open the PR instead?

yeganehkordi commented 3 years ago

Well, I don't know exactly what to do in this case. @danyaljj Is it okay if we keep both tasks? I guess we had the same problem with stereoset tasks before.

danyaljj commented 3 years ago

I am not quite sure. Send whatever PRs you have. We'll resolve the issue based on the PR content.