dlwh / epic

**Archived** Epic is a high performance statistical parser written in Scala, along with a framework for building complex structured prediction models.
http://scalanlp.org/
Apache License 2.0
469 stars 82 forks source link

Implementation of CRF parser in another language #67

Open yiakwy opened 7 years ago

yiakwy commented 7 years ago

Hi, recently I want to reimplement the CRF parser in another language like (C++/flex, python/boost and so on). I have read the paper "Less grammar, more features" which proposes more features for CRF model. I am trying to figure out how the parser is implemented using CRF model.

  1. syntactic parser, implement transition functions (anchor rule production in Context Free Gramma) to build up nodes of a syntactic tree. (part 1)
  2. CRF based model training with proposed features to provide baseline for syntactic transition function. (part 2)

I am trying to read the source code to understand how this works. I wish some the author can help me to figure out how these two parts are implemented.

For the first part, since I am going to use Neural CRF, more details about data preprocessing are appreciated.

Have a nice weekend!

dlwh commented 7 years ago

it would be good if you could list specific confusions. There's a lot going on. For the basic linear model,

score(A -> B C, i, j, k) = weights dot features(A->B C, i, j, k)

For the neural model, the score is that plus s(i, j, k) * W * r(A->B, C), where s is a feed forward neural network.

The rules are extracted from the treebank by applying "head outward" binarization using a slightly modified variant of Collins' head rules (for English). All introduced intermediate states are collapsed into a single state.

The feature templates are as described in the paper, though the lexicon is maybe underspecified. Feature hashing (as described) is important. Don't ignore.

HTH,

-- David

On Sun, Sep 24, 2017 at 4:31 PM, yiak notifications@github.com wrote:

Hi, recently I want to reimplement the CRF parser in another language like (C++/flex, python/boost and so on). I have read the paper "Less grammar, more features" which proposes more features for CRF model. I am trying to figure out how the parser is implemented using CRF model.

  1. syntactic parser, implement transition functions to build up nodes of a syntactic tree. (part 1)
  2. CRF based model training with proposed features to provide baseline for syntactic transition function. (part 2)

I am trying to read the source code to understand how this works. I wish some the author can help me to figure out how these two parts are implemented.

For the first part, since I am going to use Neural CRF, more details about data preprocessing are appreciated.

Have a nice weekend!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dlwh/epic/issues/67, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAloWO-3p9G9ZzWYHu5fbXNRMRua1neks5sllnNgaJpZM4Ph45s .