gabriel-weaver / xutools

eXtended UNIX text-processing tools
GNU General Public License v3.0
44 stars 2 forks source link

Structured Text Processing #6

Open gabriel-weaver opened 12 years ago

gabriel-weaver commented 12 years ago

TXR: a Pattern Matching Language (Not Just) for Convenient Text Extraction

Suggested by Kaz Kylheku on Slashdot.

gabriel-weaver commented 12 years ago

Lightweight Structure In Text.

Pattern matching is heavily used for searching, filtering, and transforming text, but existing pattern languages offer few opportunities for reuse. Lightweight structure is a new approach that solves the reuse problem. Lightweight structure has three parts: a model of text structure as contiguous segments of text, or regions; an extensible library of structure abstractions (e.g., HTML elements, Java expressions, or English sentences) that can be implemented by any kind of pattern or parser; and a region algebra for composing and reusing structure abstractions. Lightweight structure does for text pattern matching what procedure abstraction does for programming, enabling construction of a reusable library.

Lightweight structure has been implemented in LAPIS, a web browser/text editor that demonstrates several novel techniques:

http://www.cs.cmu.edu/~rcm/papers/thesis/

gabriel-weaver commented 12 years ago

Coccinelle

Coccinelle: A program matching and transformation tool for systems code, 2011. Retrieved November 11, 2011 from http://coccinelle.lip6.fr/.