avian2 / unidecode

ASCII transliterations of Unicode text - GitHub mirror
https://pypi.python.org/pypi/Unidecode
GNU General Public License v2.0
517 stars 62 forks source link

Add `unidecode_fast` function to speed-up mostly-ascii transliteration. #4

Closed dukebody closed 8 years ago

dukebody commented 8 years ago

See https://github.com/avian2/unidecode/issues/2

This one uses codecs and unidecode as fallback function for non-ASCII chars, which is faster than the previous PR.

avian2 commented 8 years ago

Thanks. I've made a branch based on your work here that has some cosmetic changes:

https://github.com/avian2/unidecode/tree/mostly-ascii

I've renamed unidecode_fast to unidecode_expect_ascii to make it more clear what it does. I've also added unidecode_expect_nonascii.

After some thought I also made unidecode an alias for unidecode_expect_ascii. As far as I know now, most uses of Unidecide have that usecase, and the slow down for non-ASCII strings is not that high. I still think for most people, performance difference is irrelevant. Which is also why I moved any mention of this to a separate README section.

Can you have a look? I'll merge that to master instead of this pull request.

dukebody commented 8 years ago

Thanks Tomaž. I'll try to look into it this week.

dukebody commented 8 years ago

Hi Tomaž. I've looked at the code and it seems ok to me.

I like a lot the fact that you modified the tests to test all variants! However note that since unidecode = unidecode_expect_ascii in the code you are testing almost the same thing twice. But the tests are so fast that I guess the duplicity doesn't matter at all.

So go ahead with the merge. :) Thank you a lot for dedicating some of your time to deal with this feature! I believe it doesn't directly affect your use cases, so I really appreciate your efforts.

I'm closing this pull request in favour of your branch.