charlieg / A-Smattering-of-NLP-in-Python

A very brief introduction to Natural Language Processing programming in Python
http://www.meetup.com/stats-prog-dc/events/177772322/
Apache License 2.0
153 stars 48 forks source link

Choosing the best scores> #2

Closed abhigenie92 closed 8 years ago

abhigenie92 commented 8 years ago

Hey! Thanks for the awesome answer! I don't understand this: sent_scores.sort(key=lambda sent: sent[0]). Why are sorting it ascending order?And then choosing top summary_length for summary_sentence insent_scores[:summary_length]: print summary_sentence. Instead one should sort by descendin order of score, isn't it

Correct Code(Maybe): (Pdb) sent_scores.sort(key=lambda sent: sent[0],reverse= True) (Pdb) for summary_sentence in sent_scores[:summary_length]:print summary_sentence

(0.017700155482677053, u"REVLON <REV> BUYS BEECHAM'S COMSMETICS UNIT\n Revlon Group Inc said it bought\n Germaine Monteil's cosmetics business in the U.S. from the\n Beecham Group PLC.") (0.016986217398689815, u'Terms of the sale were not disclosed.') (0.015131233236760829, u'Meanwhile in London a statement from Beecham said the\n business was sold to Revlon for 2.5 mln dlrs in cash and a\n royalty payment.\n \n\n') (0.0087916381757522505, u'The sale includes the rights to Germaine Monteil in North\n and South America and in the Far East, as well as the worldwide\n rights to the Diane von Furstenberg cosmetics and fragrances\n lines and the U.S. distribution rights to Lancaster beauty\n products.')

charlieg commented 8 years ago

You're right, it should be reverse=True -- I'll fix this.

Thanks!

abhigenie92 commented 8 years ago

Thanks for the reply and the awesome tutorial !!! This helped me greatly!! If possible can you provide examples of how Variants of TF weight and IDF weights effect the efficiency of summarization. Also, a short tutorial on multiple-document summarization if possible, would be great! Cheers!! Abhishek

charlieg commented 8 years ago

If possible can you provide examples of how Variants of TF weight and IDF weights effect the efficiency of summarization.

I'd recommend trying out some different variants for yourself to observe the effects. Also, check out BM25: https://en.wikipedia.org/wiki/Okapi_BM25

Also, a short tutorial on multiple-document summarization if possible, would be great!

Your best bet is to review the research literature. ACL has lots of great papers you can access for free: https://aclweb.org/anthology/