allenai / allennlp

An open-source NLP research library, built on PyTorch.
http://www.allennlp.org
Apache License 2.0
11.75k stars 2.25k forks source link

SRL : I don`t wanna use the pos feature and parse tree feature , how should I do ? #2642

Closed xiaodaoyoumin closed 5 years ago

xiaodaoyoumin commented 5 years ago

I using a new dataset , I had convert it in the form of CoNLL2012 except the pos information and pare tree information . I treat them as missing information case by filling with replace token as follows :

file 0 0 [percentage] VERB (TOP percentage - - - - (V) - file 0 1 [usually] - - - - - - - file 0 2 [only] - - - - - - - file 0 3 [use] - - - - - - (Part - file 0 4 [generic] - - - - - - - file 0 5 [terms] - - - - - - - file 0 6 [when] - - - - - - - file 0 7 [searching] - - - - - - - file 0 8 [for] - - - - - - - file 0 9 [consumer] - - - - - - - file 0 10 [loans] - ) - - - - - ) -

The model performance is very bad , I guess it was caused by the pos and pare tree information , model seems use these placeholder as feature and confused, so I ask you for help , How should I train the model without the syntactic feature like(pos , pare tree)

vidurj commented 5 years ago

The SRL model in allennlp does not use pos tags or parse tree information. So either the model's performance on your domain is actually bad, or your conversion to the CoNLL2012 format has a bug. You could add tests to ensure that your data is being read correctly like in https://github.com/allenai/allennlp/blob/master/allennlp/tests/data/dataset_readers/srl_dataset_reader_test.py. Let us know if this does not work.

vidurj commented 5 years ago

Please feel free to reopen this issue if this did not solve it.