clips / pattern

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
https://github.com/clips/pattern/wiki
BSD 3-Clause "New" or "Revised" License
8.72k stars 1.58k forks source link

Unexpected tokenization #315

Open devikasondhi opened 3 years ago

devikasondhi commented 3 years ago

Hello,

This is a peculiar case of enumeration where the tokenization gives inappropriate results. en.tokenize('See Section 3.) Or Section 2.)') results in ['See Section 3 .', ') Or Section 2 .', ')'] while the expectation is ['See Section 3 . )', 'Or Section 2 . )']