Open RichardLitt opened 9 years ago
cat
ing all files will result in strange behavior at the edges of the files. Would be best to cat each file as a new line in each file, and then parse them, maybe. Probably not a big deal.
What kind of strange behavior? Just because it's a newline?
Nah, I was worried that any ngrams that go over the divide wouldn't be useful as they would be from different speakers. Just checked again, and the corpus as a whole doesn't differentiate between speakers, though, so this is really moot. Catting it all is fine.
To do: