box-key / cyzil

A tool for fast and in-depth analysis of sequence generation model in Cython
Apache License 2.0
1 stars 0 forks source link

2d array in Cython #1

Closed box-key closed 4 years ago

box-key commented 4 years ago

Both edit_distance and bleu have a method that computes metric for each pair in corpus and store individual scores, rather than calculating the average score. The return type is a 'N by # of elements' array, where N is the number of reference-candidate pairs in corpus.

So far, I use numpy array without specifying data type, so I think this slows down the program. Do you think what data structure would be suitable for this method? I tried a vector of vector, but cpdef didn't allow me to return such a data type. Codes are listed below.

https://github.com/box-key/Cyzil/blob/2f7a4feabd0a4613433f8fa7528c5720ba8575f7/src/edit_distance.pyx#L98

https://github.com/box-key/Cyzil/blob/2f7a4feabd0a4613433f8fa7528c5720ba8575f7/src/bleu.pyx#L127

kylebgorman commented 4 years ago

I'd use uint32 if it's a natural number, float32 if it's decimal.

And yes, I think it generates different C++ code if specify the type on the left-hand side? You can always read the .cpp file and try to figure out if it's important to you. The associated Cython code is stored in comments.

K

On Tue, Apr 7, 2020 at 6:06 PM Kei Nemoto notifications@github.com wrote:

Both edit_distance and bleu have a method that returns score for each pair in corpus, rather than calculating the average score. The return type would be 'N by # of elements', where N is the number of reference-candidate pairs in corpus.

So far, I use numpy array without specifying data type, so I think this slows down the program. Do you think what data structure would be suitable for this method?

https://github.com/box-key/Cyzil/blob/2f7a4feabd0a4613433f8fa7528c5720ba8575f7/src/edit_distance.pyx#L98

https://github.com/box-key/Cyzil/blob/2f7a4feabd0a4613433f8fa7528c5720ba8575f7/src/bleu.pyx#L127

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/box-key/Cyzil/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABG4OKWSVNQXW5PQNDHUKTRLOPWFANCNFSM4MDO3HUA .

box-key commented 4 years ago

I decided to use vector[vector[string]] to solve this issue. This change also allows me to remove dependency on numpy from cyzil. You can see codes from the link below:

https://github.com/box-key/cyzil/blob/6ad399f37edbc1eeabf250e4393474b98147ebee/src/bleu.pyx#L118

Kei