droppyjs / droppy

Self-hosted file storage
https://droppyjs.com
BSD 2-Clause "Simplified" License
109 stars 13 forks source link

Files uploaded from MacOS with umlauts in names cannot be accessed #72

Open Tronic opened 2 years ago

Tronic commented 2 years ago

MacOS encodes filenames using Unicode normalization form NFD where ä is encoded as two characters (combining diaresis and latin a), where Windows and Linux use normalization form NFC (single character). Apparently DroppyJS converts URLs or something to NFC, making it impossible to access or even delete such files, even from the Mac. The only way to access the file is to ssh into the server and rename/delete it from there. In particular, if one already has broken filenames, these can be repaired by running within the storage folder:

convmv -f UTF-8 -t UTF-8 --nfc --notest -r .

I suggest doing filename.normalize("NFKC") on all uploaded files' names to avoid this problem during uploads, and to prevent incorrect filename encoding stored on (Linux server) filesystem. Possibly the opposite filename.normalize("NFKD") should be done for downloads on Mac clients but I am not aware of whether this is needed of if Mac does it locally anyway. Also ideally DroppyJS should be able to access such files if already stored on server, but this is less critical than handling of uploads.

markhughes commented 1 year ago

Sorry for the delay in getting to tickets, I'm dealing with some pretty savage health issues. It's on my mind.

markhughes commented 1 year ago

I think either:

  1. we add support to configure this value
  2. we add a better support layer for file system types
Tronic commented 1 year ago

I don't think it should be configurable, but rather always convert to NFC (of NFKC) any filenames uploaded/created, which should solve this issue with Mac clients and Windows or Linux server. This doesn't need any OS detection, just always do it.

Possibly it could also convert to NFD (or NFKD) any filenames stored on Mac filesystem (i.e. on a Mac server) and/or convert any filenames sent to Mac clients to one of those forms. I have not tested whether browsers already handle that automatically for Mac clients. If you'd like, I can do the experiments to find you the correct values to use.

Either way, none of these have any use for config options, it just needs to be done at all cases for proper Mac interop (even if there are only Macs involved because some part of the current implementation uses NFC/NKFC making files using the D type encoding inaccessible). For the purposes of this discussion, iPhone/iOS are the same as MacOS, using the D encoding for all Unicode.

Tronic commented 1 year ago

Quick testing results (with NodeJS Express, Formidable and Chrome all on Mac):

TLDR: filename = uploaded_filename.normalize('NFC') will fix this for you and it doesn't affect any non-Apple clients. If you use a middleware that directly stores the file on disk using incorrect name (encoded as NFD), you may need to rename it or patch the middleware instead.

markhughes commented 1 year ago

Thanks for that!