clulab / twitter4food

Repository for the health informatics analytics on twitter project
Apache License 2.0
1 stars 4 forks source link

Better tokenization #17

Closed herongrove closed 7 years ago

herongrove commented 7 years ago

Closes #4

MihaiSurdeanu commented 7 years ago

This is good, but it's better to use split("\s+") rather than split(" +")

herongrove commented 7 years ago

Ah, yes. Because we were splitting already-tokenized text, I defaulted to a space, but \s would be safer. I'll make that change.