Heidelberg-NLP / COINS

The corresponding code from our paper " COINS: Dynamically Generating COntextualized Inference Rules for Narrative Story Completion (ACL 2021)". Do not hesitate to open an issue if you run into any trouble!
17 stars 2 forks source link

Questions about data formatting #3

Open id4thomas opened 2 years ago

id4thomas commented 2 years ago

Hi, I am currently trying to reproduce your work (specifically COINS GR) and have a few questions about the training data.

From your paper it seems the training data for Knowledge Model would be

and

Story Model would be

But looking at the part where you load the data (https://github.com/Heidelberg-NLP/COINS/blob/main/model/src/data/conceptnet.py) it is confusing which corresponds to which. Also, the data downloaded with the given script doens't match the format used in the rest of the code

It would be nice if you could provide a data sample for each Knowledge and Story Models or the model weight if possible.

Thank you

debjitpaul commented 2 years ago

Hi Song,

Sorry for the delayed reply. You are looking into the file for Story Model. So, Line 93 reads the input which is in the following format: self.masks[split]["total"] = [(len(i[0]), len(i[1]), len(i[2]), len(i[3]), len(i[4]), len(i[5]), len(i[6]), len(i[7]), len(i[8]), len(i[9]), len([10])) for i in sequences[split]]

During Training:
where i is the following: Incomplete Story(i.e, S1, S2 [SEP] S5) #Effect# S2 \t Ouput_Effect_S2 \t Incomplete Story(i.e, S1, S2 [SEP] S5) #Cause# S5 \t Ouput_Cause_S5 \t Incomplete Story(i.e, S1, S2 [SEP] S5) \t Incomplete Story(i.e, S1, S2 [SEP] S5) [SEP] Ouput_Effect_S2 [SEP] Ouput_Cause_S5 \t Output_S3 \t Incomplete Story(i.e, S1, S2 S3 [SEP] S5) #Effect# S3 \t Ouput_Effect_S3 \t Incomplete Story(i.e, S1, S2 S3 [SEP] S5) #Cause# S5 \t Ouput_Cause_S5 \t Incomplete Story(i.e, S1, S2 S3 [SEP] S5) \t Incomplete Story(i.e, S1, S2 S3 [SEP] S5) [SEP] Ouput_Effect_S3 [SEP] Ouput_Cause_S5 \t Output_S4 \t S2 +'\t'+ S1 +' '+ S2 +'\t'+ S5+ '\n'

I hope this answers your question. Feel free to ask me any questions.

id4thomas commented 2 years ago

Thank you for the feedback!

However, it is still hard to understand the given example..

Considering both files below

https://github.com/Heidelberg-NLP/COINS/blob/main/model/src/data/conceptnet.py

https://github.com/Heidelberg-NLP/COINS/blob/main/model/src/train/batch.py

in the for loop of batch_conceptnet_generate function (line 92)

when i==0

when i==1

So does it mean i1, o1, i3, o3 corresponds to

Incomplete Story, Ouput_Effect_S2/Ouput_Cause_S5, Incomplete Story, Ouput_Effect_S3/Ouput_Cause_S5

and i2,o2, i4, o4 to

Incomplete Story, Output_S3, Incomplete Story, Output_S4?

Also when splitting the example given at line 94 of conceptnet.py (make_tensors) the list would be

  1. Incomplete Story(i.e, S1, S2 [SEP] S5) #Effect# S2
  2. Ouput_Effect_S2
  3. Incomplete Story(i.e, S1, S2 [SEP] S5) #Cause# S5
  4. Ouput_Cause_S5
  5. Incomplete Story(i.e, S1, S2 [SEP] S5)
  6. Incomplete Story(i.e, S1, S2 [SEP] S5) [SEP] Ouput_Effect_S2 [SEP] Ouput_Cause_S5
  7. Output_S3
  8. Incomplete Story(i.e, S1, S2 S3 [SEP] S5) #Effect# S3
  9. Ouput_Effect_S3
  10. Incomplete Story(i.e, S1, S2 S3 [SEP] S5) #Cause# S5
  11. Ouput_Cause_S5
  12. Incomplete Story(i.e, S1, S2 S3 [SEP] S5)
  13. Incomplete Story(i.e, S1, S2 S3 [SEP] S5) [SEP] Ouput_Effect_S3 [SEP] Ouput_Cause_S5
  14. Output_S4
  15. S2 +'\t'+ S1 +' '+ S2 +'\t'+ S5+ '\n’

It doesn’t seem to match the 11 sequences the code expects.