Enhancement: Statistical language model

GoogleCodeExporter commented 9 years ago

One very useful enhancement to simplenlg would be to add a statistical language 
model (which is present in some other realisers, such as open ccg).  Many of 
the problems reported with simplenlg reflect the difficulty of making some 
syntactic choices without semantic and pragmatic knowledge.  Examples include

Adjective ordering: eg, "happy old lady" vs "old happy lady"

Use of bare infinitive: eg, "I see John eat an apple" vs "I see John thinks he 
is smart"

Use of mass nouns as count nouns: eg, "There is a lot of sand on the beach" vs 
"Many sands contain iron impurities"

I think a good statistical language model could address many of these issues; 
my suggestion would be to encode choices in the lexicon (not grammar), and then 
overgenerate and select using an ngram model.

I doubt I will have time to do this myself, but I would be happy to discuss my 
ideas with anyone who was interested in doing this.  Please email me at 
e.reiter@abdn.ac.uk

Original issue reported on code.google.com by ehud.rei...@gmail.com on 24 Mar 2011 at 7:54

GoogleCodeExporter commented 9 years ago

Heya -- I'm on the adjective ordering thing, I'll implement this after I'm done 
with the tutorial.  We'll have to talk about the best system to use, I have a 
few.

Original comment by ital...@gmail.com on 24 Mar 2011 at 8:23

GoogleCodeExporter commented 9 years ago

The NIH Specialist Lexicon has adjective ordering position attribute.

Original comment by ChristopherCHowell@gmail.com on 12 Apr 2011 at 6:37

GoogleCodeExporter commented 9 years ago

Unfortunately the ordering information in the NHS lexicon is not that useful in 
practice (Meg Mitchell investigated this)

Original comment by ehud.rei...@gmail.com on 21 Apr 2011 at 11:43

brian-dawn / simplenlg

Enhancement: Statistical language model #5