dishmint / LexicalCases

Extract substrings matching a lexical pattern
https://www.paclets.com/FaizonZaman/LexicalCases
MIT License
2 stars 0 forks source link
linguistics pattern-matching text text-analaysis text-mining text-search wolfram-language wolfram-mathematica

LexicalCases [EXPERIMENTAL]

Extract substrings matching a lexical pattern.

Install

Load the paclet from the Paclet Repository

PacletInstall[ResourceObject["FaizonZaman/LexicalCases"]]
Needs["LexicalCases`"]

Supports v14.0+

Usage

Search strings, files or wikipedia articles for a lexical pattern.

oosp = ExampleData[{"Text", "OriginOfSpecies"}];
oospPattern = BoundToken[WordToken[2], BoundToken["specie"|"species"]];

oospResults = LexicalCases[oosp, oospPattern]

All Text Content Types can be used, however, some will take unreasonably long to expand, especially if it's meant to represent a hefty piece of text, like a topic type. The basic parts of speech types are good ones to start with:

alice = ExampleData[{"Text", "AliceInWonderland"}];
alicePattern = "Alice" ~~ TypeToken["Verb"] ~~ TypeToken["Adverb"];

aliceResults = LexicalCases[alice, alicePattern]

Use lexical patterns in StringCases, StringPosition and StringmatchQ by wrapping the pattern with LexicalPattern.

Here's an example creating an operator of StringCases:

aliceOp = StringCases[LexicalPattern["Alice" ~~ TypeToken["Verb"] ~~ TypeToken["Adverb"]]];

The paclet documentation includes additional examples, or visit LexicalCases on the Wolfram Paclet Repository.