amol- / depot

Toolkit for storing files and attachments in web applications
MIT License
161 stars 41 forks source link

Remove unidecode #66

Closed jpmccu closed 4 years ago

jpmccu commented 4 years ago

Fixes #64 by using URL encoding when unidecode isn't installed. It also removes the dependency on unidocde so that it is only installed when users want it/can use it.

coveralls commented 4 years ago

Coverage Status

Coverage decreased (-16.6%) to 81.271% when pulling 655465d9f59cdb0338dee746f3ea3100f222a910 on jimmccusker:remove_unidecode into 875eb591f62454d8773af9dad9e3550ea88e2f66 on amol-:master.

jpmccu commented 4 years ago

Right, but I couldn't find a replacement for unidecode that does the job and isn't GPL. I'm using filedepot in a non-GPL web framework, and filedepot is itself under a less restrictive license than GPL. This is the best solution I could find.

On Fri, Jul 24, 2020 at 7:23 PM Alessandro Molina notifications@github.com wrote:

@amol- commented on this pull request.

In depot/utils.py https://github.com/amol-/depot/pull/66#discussion_r460327746:

from ._compat import percent_encode

+try:

  • from unidecode import unidecode as fix_chars

+except ImportError:

  • from unicodedata import normalize

  • def fix_chars(string):

  • return percentencode(normalize('NFKC', string), safe='!#$&+-.^`|~', encoding='utf-8')

Percent encoding doesn't seem to achieve the goal of this which was mostly to ensure that a "plain ascii version" of filenames was provided in download such that the file could be saved under any kind of OS/filesystem.

Maybe evaluate replacing unidecode with anyascii ( https://github.com/hunterwb/any-ascii ) that should be under a compatible license ( https://github.com/hunterwb/any-ascii/blob/master/LICENSE ) and leads to nearly equivalent results.

from anyascii import anyascii

anyascii("Крупный")

'Krupnyy'

anyascii("àèìòù")

'aeiou'

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/amol-/depot/pull/66#pullrequestreview-455231445, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAETCELGPDZBERPJMIAFMWTR5IJXFANCNFSM4PG6DJCQ .

-- Jim McCusker

Director, Data Operations Tetherless World Constellation Rensselaer Polytechnic Institute mccusj2@rpi.edu mccusj@cs.rpi.edu http://tw.rpi.edu

amol- commented 4 years ago

Well, that's why I pointed out anyascii -> https://pypi.org/project/anyascii/ Seems to be maintained, result comparable and MIT license compatible -> https://github.com/hunterwb/any-ascii/blob/master/LICENSE

amol- commented 4 years ago

I made https://github.com/amol-/depot/pull/67 which is a proposal for using anyascii which retains same exact behaviour on Python3. On Python2 a stripped down behaviour is provided but Python2 is slowing being phased out.

jpmccu commented 4 years ago

Pull #67 is a better option than this, closing the PR in favor of it.