colinpollock / seinfeld-scripts

Scripts for parsing Seinfeld scripts
http://colinpollock.net/seinfeld-script-data
56 stars 6 forks source link

Don't split sentences #3

Closed colinpollock closed 8 years ago

colinpollock commented 8 years ago

If a single line spoken by a character (called an utterance in the DB) contains multiple sentences then each sentence is stored as its own row in the sentence table. I can't remember why I did this, and I don't really see a benefit to this.

I also split sentences in a really brittle way. So right now the line LOIS: Oh, Mr. Meyers this is my friend, Jerry. was incorrectly split and there are two sentences in the DB: "Oh, Mr." and "Myers this is my friend, Jerry".

So, the task here is to just completely remove sentence splitting.