AbrahamSanders / SIMIE

SIMIE - A SImulated MInd's Eye is an experiment in narrative extrapolation from dialog using GPT-2.
MIT License
5 stars 0 forks source link

[Dev] Corpus builder - support narrative segments nested within dialog paragraphs #2

Closed AbrahamSanders closed 3 years ago

AbrahamSanders commented 3 years ago

Many times a narrative segment will be in the same paragraph as a dialog turn. We should support extracting these just as we do for narrative segments in their own paragraphs:

Source excerpt from Sister Carrie by Theodore Dreiser:

She began to pull the basket over, and now, in spite of all protest,
she had swung over and was going down.

“Carrie,” she called, “Carrie, come back”; but Carrie was far down now
and the shadow had swallowed her completely.

She moved her arm.

What is currently extracted:

233.txt:  [N]: She began to pull the basket over, and now, in spite of all protest, she had swung over and was going down. 
233.txt:  [D]: Carrie, Carrie, come back
233.txt:  [N]: She moved her arm. 

What should be extracted:

233.txt:  [N]: She began to pull the basket over, and now, in spite of all protest, she had swung over and was going down. 
233.txt:  [D]: Carrie, Carrie, come back
233.txt:  [N]: but Carrie was far down now and the shadow had swallowed her completely. <<< ADDED <<<<<<<<<
233.txt:  [N]: She moved her arm. 

The order in which the dialog and narrative are positioned in the paragraph should dictate the order in which they appear in the corpus. For example, the narrative can come before the dialog turn:

Source excerpt, also from Sister Carrie by Theodore Dreiser:

“All right,” he said, “but you’ll hear me out, won’t you? After all you
have said about loving me, you might hear me. I don’t want to do you
any harm. I’ll give you the money to go back with when you go. I merely
want to tell you, Carrie. You can’t stop me from loving you, whatever
you may think.”

He looked at her tenderly, but received no reply. “You think I have
deceived you badly, but I haven’t. I didn’t do it willingly. I’m
through with my wife. She hasn’t any claims on me. I’ll never see her
any more. That’s why I’m here to-night. That’s why I came and got you.”

“You said Charlie was hurt,” said Carrie, savagely. “You deceived me.
You’ve been deceiving me all the time, and now you want to force me to
run away with you.”

What is currently extracted:

233.txt:  [D]: All right, but you’ll hear me out, won’t you? After all you have said about loving me, you might hear me. I don’t want to do you any harm. I’ll give you the money to go back with when you go. I merely want to tell you, Carrie. You can’t stop me from loving you, whatever you may think.
233.txt:  [D]: You think I have deceived you badly, but I haven’t. I didn’t do it willingly. I’m through with my wife. She hasn’t any claims on me. I’ll never see her any more. That’s why I’m here to-night. That’s why I came and got you.
233.txt:  [D]: You said Charlie was hurt, You deceived me. You’ve been deceiving me all the time, and now you want to force me to run away with you.

What should be extracted:

233.txt:  [D]: All right, but you’ll hear me out, won’t you? After all you have said about loving me, you might hear me. I don’t want to do you any harm. I’ll give you the money to go back with when you go. I merely want to tell you, Carrie. You can’t stop me from loving you, whatever you may think.
233.txt:  [N]: He looked at her tenderly, but received no reply. <<< ADDED <<<<<<<<<
233.txt:  [D]: You think I have deceived you badly, but I haven’t. I didn’t do it willingly. I’m through with my wife. She hasn’t any claims on me. I’ll never see her any more. That’s why I’m here to-night. That’s why I came and got you.
233.txt:  [D]: You said Charlie was hurt, You deceived me. You’ve been deceiving me all the time, and now you want to force me to run away with you.
AbrahamSanders commented 3 years ago

Implemented basic functionality in this commit co-authored with @Erikellerx

Left to do:

  1. Implement option for minimum narrative length to filter out extraneous narrative passages such as "she said,". When filtering these out, the surrounding dialogs should be concatenated as usual.
  2. Implement rule to filter out preceding punctuation from intermediate narrative passages.
AbrahamSanders commented 3 years ago

Implemented the to do items in the previous comment, tested for backward compatibility, and merged to master in this PR.