charlieg / A-Smattering-of-NLP-in-Python

A very brief introduction to Natural Language Processing programming in Python
http://www.meetup.com/stats-prog-dc/events/177772322/
Apache License 2.0
153 stars 48 forks source link

Wrong pip install cmd for readability-lxml #1

Closed srhrshr closed 9 years ago

srhrshr commented 9 years ago

Please correct the line pip install readability- lxml to pip install readability-lxml

charlieg commented 9 years ago

Thanks, @SreeHarshaRamesh

srhrshr commented 9 years ago

No problem at all. Thanks @charlieg for a great resource which as a beginner, I could follow it along quite comfortably.

Technically, we didn't need to use sent_tokenize(), but if we only used word_tokenize() alone, we'd see a bunch of extraneous sentence-final punctuation in our output.

Could you give an example as to what this punctuation that sent_tokenize() does not capture is, because, I found no difference whatsoever between the two on comparison