clips / pattern

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
https://github.com/clips/pattern/wiki
BSD 3-Clause "New" or "Revised" License
8.72k stars 1.58k forks source link

Plural forms with parsetree and search in Dutch #289

Open hekl opened 4 years ago

hekl commented 4 years ago

I am using parsetree(text, lemmata=True) (pattern.nl) and search(searchterm, text) for Dutch. Like: search: "baan" in "Het aantal banen in Noord-Holland neemt toe." This has the advantage of constructing a lemma and also searching for a plural form of the search term. But not in all cases. "Stad" as a search term also searches for "steden". But "baan" does not search for "banen", "arbeider" does not search for "arbeiders". Is there a logic to this? The pluralize function gives the correct plural forms.