Open zhunyoung opened 2 years ago
Hi @zhunyoung ! Thanks for pointing this out! We already have a project underway cleaning some of these issues - most of these issues stemmed from the use of Amazon Mechanical Turk templates, some of which were not as clean as we hoped. Our next version will be released in the next couple of months, which will fix most of these issues. Thanks for your interest! :)
Hi authors, thanks for putting together this dataset! Just wanna follow up if there's any update on fixing the dataset errors. If not, would it be possible to at least have a subset of the dataset which is known to be error-free? This would be very useful for comparing different models.
Thanks in advance for any help!
Seconded - it would be quite unfortunate if such a great dataset contains so many errors that render it unuseable. Looking forward to the fix asap!
Hi @zhunyoung @veronica320 @zharry29, thanks for your interest in the dataset, and apologies for the delayed response. As mentioned earlier, these issues stem from the Amazon Mechanical Turk templates. Specifically, the issue stems from the problem of role swapping - where annotators swapped the roles of entities leading to hierarchically opposite kinship relations.
Since there exists close to 5000 templates, manually re-annotating them is extremely time consuming, and I don't have the bandwidth for it. However, I spent some time to figure out how to extract the relations automatically so that we can at least filter out the logically incorrect templates. Turns out, this is a hard problem, but I have been able setup a process to do so. I have released the new templates in the develop
branch, where you can find templates annotated by two models: Flan T5 and GPT3, both of which are surprisingly good at extracting the relation from the templates! Using their annotations you can now filter the templates during dataset generation using the code at the develop
branch (CLUTRR v1.3).
I have documented the whole process at this blog post if you are curious to know more / explore alternative methods. Please feel free to provide feedback in this thread, and also let me know if you face any issues generating data using the code at develop
branch!
Thanks for reading, and thanks for pointing out this issue in the first place. I'll pin this thread so that future users can be aware of this.
Hi @koustuvsinha,
Thanks for your detailed explanations and your codes and post!
I followed the steps to install the develop
branch in a new conda environment. Then, after installing the sklearn package with conda install -c conda-forge scikit-learn
, I could successfully run the data generation script ./generate.sh
. I noticed that the generated story is super long. Below is a copy of the story in the first test data that is generated using ./generate.sh
.
"[Shelton] and his daughter [Louie] took a day off school to go to the zoo. [Louie] and her uncle [Nathaniel] went to the pet shop. [Louie] saw a puppy that she loved, so [Nathaniel] bought it for her. [Malvina] took her grandson [Colin] to the park. [Colin]'s brother [Nathaniel] was already there. [Shelton] took his grandson [Artie] to the baseball game. [Shelton] took his sister [Blanche] out to lunch after learning that she got accepted into her first choice for university. [Shelton] took his grandson [Shelton] and [Shelton]'s brother [Nellie] to the amusement park Saturday and they had a good time. [Jeremiah] and his mother, [Louie], went to a pet store. [Jeremiah] wanted a parrot, but his mom got him a smaller bird instead. [Karl] enjoys picking flowers with his son's daughter. Her name is [Louie]. [Nathaniel] went to his brother [Artie]'s Birthday party [Karl] would n't let his son [Colin] go to the park by himself. [Colin]'s brother [Colin] offered to go with him. [Olin] took his grandson [Colin] to a movie at the local theater. [Serena] went to her son [Colin]'s House [Malvina] was excited because she got to go to the zoo with her grandson [Artie]. [Blanche]'s grandfather, [Olin], baked her a beautiful cake for her 9th birthday. [Serena] just had a baby and presented the baby proudly to the new maternal grandmother, [Allie]. [Nellie]'s grandmother, [Allie], was eager to spend a weekend with all of her grandchildren. [Linnie] asked her aunt [Serena] for 5 dollars for her field trip. [Linnie] made a cake for her grandfather, [Hollie]. [Serena] and her mother [Olin] made breakfast together. [Helen] had picked her daughter [Serena] out the cutest new dress to wear on her birthday. [Blanche] spent a great day shopping with her daughter, [Walter]. [Nellie] dropped his niece [Walter] off at school. [Elizabeth] had a daughter named [Blanche]. [Blanche] and her brother [Shelby] went to see a movie. [Helen] and her husband [Olin] went on a cruise. They had a wonderful time. [Karl] and [Serena] were married twenty years ago today, becoming husband and wife on a glorious spring day. [Helen] picked up her husband, [Olin] from the pool. "
I checked the hyper-parameters in generate.sh
but the following settings
MAX_PATH_LEN=5
...
TEST_DESCRIPTOR_LENGTHS=\'3,4\'
seem not to restrict the number of sentences in the generated story. I'm not sure how to regenerate a dataset similar to the original CLUTRR dataset. If you have generated a dataset with the cleaned templates already, could you share that with us for evaluation purposes on our own models? If not, could you give some guidelines to generate such cleaned data?
Thanks a lot!
Hi @zhunyoung , apologies for the delayed response - the notification of this thread seems to miss my inbox for some reason. The reason I believe is that the noise setting is set to True, which is one of the test conditions of the CLUTRR dataset (we added spurious noise, such as dangling, irrelevant or disconnected paths - please check the paper for more details). If you set it to False (NOISE=false
) then you should get a shorter story.
Thanks for the help! My goal is to use the new code to
To achieve the goal above, I still need to generate some data instances with NOISE=true
. Can the above goal be achieved using the current code or is this part still under development? Thanks!
@azreasoners just use the flag NOISE_POLICY
flag appropriately along with NOISE=true
.
Dear authors, @koustuvsinha @pminervini @shagunsodhani
Thanks for the great work!
data_06b8f2a1/1.3_test.csv
in the dataset. It seems to me that a big portion of the data may not be correct.Since other users already submitted issues to report errors in the dataset a year ago, is there any update to the dataset (e.g., a cleaner version with fewer mistakes)? Thanks a lot!