ImperialCollegeLondon / django-drf-filepond

A Django app providing a server implemention for the Filepond file upload library
BSD 3-Clause "New" or "Revised" License
103 stars 39 forks source link

Support non-ASCII characters in filenames #99

Closed jcohen02 closed 6 months ago

jcohen02 commented 8 months ago

This issue relates to a PR opened by @roman-tiukh (#76). This is being resolved via a separate issue since the original PR now includes a number of other undocumented changes specific to the local fork in which the PR was originally created.

The original PR highlighted the need to support cyrillic characters in filenames. In addition to resolving this issue, there is a more general issue of the need to support non-ASCII characters in filenames in general.

jcohen02 commented 8 months ago

This issue can be addressed by using the filename* parameter in the Content-Disposition header as detailed in RFC6266 (see section 5).

Ideally there would be support within Python or a maintained, BSD/MIT licensed third-party library to support reliable generation of these headers according to the spec and to ensure that, for example, invalid characters are not being used.

As of Django 4.2, there is a function to create the header (django.utils.http.content_disposition_header) but there is no inbuilt support for earlier versions.

We also, ideally need a way to reliably parse encoded headers so that unit tests can verify that the generation of the header is working correctly. Python's email.Message class was previously being used for this but it doesn't seem to correctly parse RFC6266-encoded Content-Disposition headers. There are a couple of open Python bug tracker issues for this (23434, 33027).

jcohen02 commented 6 months ago

This is fixed in #108 and will be included in release v0.6.0 to be avialable shortly.