box-key / cyzil

A tool for fast and in-depth analysis of sequence generation model in Cython
Apache License 2.0
1 stars 0 forks source link

Stop using split to count ngram order #7

Closed box-key closed 4 years ago

box-key commented 4 years ago

There's gotta be a better way to check ngram order of token than this. https://github.com/box-key/cyzil/blob/9907a51a815daea610e1dfb1d3fb2e4db7971304/src/bleu.pyx#L151

Perhaps, I can use struct for this task. Something like:

Cdef struct Ngram:
    int order
    int count

So unordered map looks like unorderedmap[string, Ngram]

Reference: https://share.cocalc.com/share/c4cb4a9830136f7bdc07b11c803665cc99b3d899/advanced-cython.html?viewer=share

box-key commented 4 years ago

cdef struct is gonna be dict in cython. So it's element is accessed by dict style reference like ngram_count['order']