attardi / deepnl

Deep Learning for Natural Language Processing
GNU General Public License v3.0
457 stars 116 forks source link

How to train new SRL model ? #25

Open GraphGrailAi opened 8 years ago

GraphGrailAi commented 8 years ago

Hi, deepnl is cool, but i cannot find good tutorial on how to train my custom Semantic Role labeling model (language other than english). I have read presentation and article http://docslide.us/documents/the-tsunami-of-deep-learning-over-nlp-giuseppe-attardi-dipartimento-di-informatica.html , http://www.aclweb.org/anthology/W15-1515.

What i need: i want to pass a bunch of .txt files with text data to deepnl and get result as pretrained model. Then as i ques i can pass this model to tagger = SRLTagger.load(open(filename)) and now it is ready to add semantic roles to each word in sentence.

Then i want to use semantic roles to identify facts and opinions about some objects, i.e.: "I dont like BankName because it doesn't supply customer service" - the output will be BankName - customer service. That means the problem with this bank is customer service. Is i am on right path?

attardi commented 8 years ago

On 12 mar 2016, at 13:19, GraphGrail notifications@github.com wrote:

Hi, deepnl is cool, but i cannot find good tutorial on how to train my custom Semantic Role labeling model (language other than english). I have read presentation and article http://docslide.us/documents/the-tsunami-of-deep-learning-over-nlp-giuseppe-attardi-dipartimento-di-informatica.html http://docslide.us/documents/the-tsunami-of-deep-learning-over-nlp-giuseppe-attardi-dipartimento-di-informatica.html , http://www.aclweb.org/anthology/W15-1515 http://www.aclweb.org/anthology/W15-1515.

What i need: i want to pass a bunch of .txt files with text data to deepnl and get result as pretrained model. Then as i ques i can pass this model to tagger = SRLTagger.load(open(filename)) and now it is ready to add semantic roles to each word in sentence.

In principle yes, but DeepNL expects input in CoNLL format. You need to perform sentence splitting and tokenization first with some separate tool. And for training you need a corpus annotated with predicates, as in the CoNLL Shared Task 2008. Then i want to use semantic roles to identify facts and opinions about some objects, i.e.: "I dont like BankName because it doesn't supply customer service" - the output will be BankName - customer service. That means the proble with this bank is customer service. Is i am on right path?

Yes, but the path might be a long one ;-)

— Reply to this email directly or view it on GitHub https://github.com/attardi/deepnl/issues/25.

GraphGrailAi commented 8 years ago

Thanks for answer, sentence splitting and tokenization is not a problem, i can do this myself in Python. But i dont understand what is "corpus annotated with predicates, as in the CoNLL Shared Task 2008". I googled https://catalog.ldc.upenn.edu/LDC2009T12 but no data sample available to reproduce.

Also i have found http://conll.cemantix.org/2012/data.html but istructions are hard to read and they mostly unclear

attardi commented 8 years ago

On 12 mar 2016, at 16:04, GraphGrail notifications@github.com wrote:

Thanks for answer, sentence splitting and tokenization is not a problem, i can do this myself in Python. But i dont understand what is "corpus annotated with predicates, as in the CoNLL Shared Task 2008". I googled https://catalog.ldc.upenn.edu/LDC2009T12 https://catalog.ldc.upenn.edu/LDC2009T12 but no data sample available to reproduce.

This is the right one. Also i have found http://conll.cemantix.org/2012/data.html http://conll.cemantix.org/2012/data.html but istructions are hard to read and they mostly unclear

This is a different task. — Reply to this email directly or view it on GitHub https://github.com/attardi/deepnl/issues/25#issuecomment-195753678.

GraphGrailAi commented 8 years ago

Your answers so short) Is there a tutoral on how to make corpus annotated with predicates from raw text data? (maybe offtopic, but in http://nilc.icmc.usp.br/nlpnet/training.html described steps on how to get srl, but dont work for me)