CityOfNewYork / CROL-Overview

City Record Online parsing libraries and supporting files
26 stars 14 forks source link

Parsing/Research #40

Closed cds-amal closed 9 years ago

cds-amal commented 9 years ago

Evaluate these three tools to determine how we can apply it to our parsing approach:

  1. pyparsing Use this module to model a BNF parser similar to regulation parser.
  2. NLTK lib for chunking/chinking
  3. probabilistic parserator
cds-amal commented 9 years ago

A combination of Regex, NLTK and probabilistic approach is recommended. Things to consider:

  1. looking for needle in haystack? use a combination of regex, nltk and parserator powered solutions like the address parser