bottlepy / bottle

bottle.py is a fast and simple micro-framework for python web-applications.
http://bottlepy.org/
MIT License
8.38k stars 1.46k forks source link

FileUpload.filename cut off multi-byte characters (only extension is left) #855

Open touta opened 8 years ago

touta commented 8 years ago

I'm just trying update my product's bottle.py 0.11 to 0.12; and found objects in Request.files changed to new class FileUpload from cgi.FieldStorage.

They are not compatibile and have difference with filename:

Those changes make hardly problem in multi-byte culture. If user upload file with non-ASCII named file, we got only extension of filename with FileUpload.filename.

For example: あいうえお.txt v cut off non-ascii .txt v strip '.' txt

We can also use raw_filename insteadly, but

  1. need to care 'raw' or 'not' of filename for simply display filename of user uploaded is nonsense.
  2. filename of non-ASCII named become meaningless and less safe for name uniqueness.

For problem 1: Renaming raw_filename/filename to filename/safe_filename is most best solution, I think. Most of non-expert user use save method. Or use 'safe_' prefixed for saving. Thus security affect of this changes would be minimal.

FYI, as for my product, files are managed with sequence number. Filenames are stored on DB for user can identify them. so no need to make filename safe.

For problem 2: Apply percent or some other escaping instead of cut off, is suboptimal to continuing current way. This may also solve part of problem 1, escaped strings is a enough hint to use raw_filename.

Anyway, cutting off is bad idea and hope fix this. Naming file with only Japanese characters is very neutral for ordinaly people in Japan. Might be same in Chinese or other non-ASCII countries.

thanks

seems related: https://github.com/bottlepy/bottle/issues/582

defnull commented 8 years ago

Unfortunately there is no solution without breaking backwards compatibility, but the current behavior is bad enough to do it anyway.

We have to subclass and workaround cgi.FieldStorage (see #852) and can introduce a better unicode file name handling while doing so.