Convert the NLP chapter to the 3rd edition

aimacode / aima-python

Python implementation of algorithms from Russell And Norvig's "Artificial Intelligence - A Modern Approach"

MIT License

7.98k stars 3.78k forks source link

Convert the NLP chapter to the 3rd edition #227

Closed darius closed 7 years ago

darius commented 8 years ago

nlp.py is the last code file still at the 2nd edition. Though there's not too much code, it needs a big overhaul because the text switched from chart parsing to the CYK algorithm, and added probabilities to the grammar productions. CYK imposes Chomsky Normal Form on the grammar.

The example grammar is close to being in Chomsky form, and automatically converting to it from general grammars is an exercise, so I'm leaning towards manually converting it. We'll have to decide how the probabilities get into the grammar, too.

It's probably worth keeping the chart parser, since it handles general grammars, but it'd need to take grammars in the same format as CYK.

I'm thinking of tackling this.

norvig commented 8 years ago

That's great -- you are right that this is overdue. I'm not sure yet what changes there will be for 4th edition, but it will probably be less change than from 2nd to 3rd.

I agree we should keep the chart parser.

One question I have is whether grammar rules should be parsed from strings, or whether they should use the Expr class:

 rules('S -> NP VP | S Conj S', 
           ...)

 rules(S >> NP + VP | S + Conj + S,
          ...)

the advantage of the later is that it is easier to add arguments:

 S(head) >> NP(Sbj, pn, h) + VP(pn, head)

darius commented 8 years ago

Using Expr sounds interesting. Some minuses:

You need to declare S, NP, etc., as symbols first.
I think >> has tighter precedence than |. (I just checked the Python grammar.)
It's further from the standard notation.

Some ideas for the second problem:

Just wrap the right hand sides in ().
Write the grammar as a dict, like {S: NP + VP | S + Conj + S, ...} or rules(S = NP + VP | ...).
Use different operators: S >> NP * VP + S * Conj * S
Deal with the first annoyance at the same time with code like

S = Rule(lambda: NP + VP | S + Conj + S) NP = Rule(lambda: ...)

(lambdas needed so forward refs can work.) I think that's too fancy for aima-python, but worth mentioning in case it suggests some other brainwave.

I'd stick with strings on the first pass at this.

norvig commented 8 years ago

I agree that the lambdas are too fancy (and too ugly), and I agree that sticking to strings is the way to go. But consider that the way to parse the strings might end up being not just a bunch of str.splits, but instead some substitutions followed by an eval. In utils.expr, I handle the forward reference problem by doing an eval in a dict that interns new Symbols.

darius commented 8 years ago

Nice idea -- I'll hopefully get to this on Sunday and I'll keep that in mind.