Closed infeo closed 5 years ago
For the record: 4 chars file extension is just reserved for future use, so we don't need to migrate filenames if we decide to add an extension (see #54).
For the record: 248 BASE32 chars giving us a maximum ciphertext length of 155 bytes, which consists of 16 bytes IV and 139 bytes payload.
The maximum cleartext filename before shortening happens (⎣ 248 * ⅝ ⎦ - 16 = 139
bytes) is this long:
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855e3b0c44.txt
The previous maximum cleartext filename length was ⎣ 129 * ⅝ ⎦ - 16 = 64
, i.e. not even half as long:
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852.txt
When looking at filename length distribution functions (such as on page 26 in this paper), we can clearly see that short filename are far more common than long ones. While e.g. the examined "PDL-Home" only a 0.99 percentile was shorter than 63 chars, 0.9999 of all filenames were shorter than 135 chars.
While I have no clue what kind of files are stored on PDL-Home, this was the "worst case scenario" in the study and the general idea seems reasonable, that the number of files declines drastically with the length of their names.
What does this mean for us? By doubling the threshold we have reduced the likelihood of name shortening happening by 2-3 orders of magnitude.
Filenames must be encrypted deterministically, otherwise we break directory listings. By changing the shortening threshold we break this rule.
Therefore this change requires a new vault format version, otherwise we'd break compatibility with other applications (such as our mobile apps or @iterate-ch's Cyberduck and Mountain Duck).
Therefore we defer this issue to a different minor version.
The maximum filename length on many file systems and/or clouds is 255. But: We need to leave some space for a cloud service to add some (conflict 2018-08-21 00-22-09)
suffix. Let's say we want to reserve 35 chars.
This leaves us with 220 usable chars for prefix (2 chars), encoded ciphertext (216 chars) and extension (4 chars). 216 chars in base32 encoding is equivalent to 135 bytes of ciphertext. Subtracting the IV this gives us 119 cleartext chars.
If we have to migrate every filename anyway (see #64), we might want to switch to base64. With the same 220 usable chars our 216 encoded ciphertext chars are now equivalent to 162 ciphertext bytes or 146 cleartext bytes.
On aforementioned PDL-Home this reduces the likelihood of name shortening even further, only 0.001% of all files would have been affected from shortening.
C:\Users\Sebastian\OneDrive\format7\d\IX\LPPQAPWUNCQTWROJY734VIT7YTQEGM\xr3BnomfWSZLqb13EBi1zwXu34xDwQQtPTkqmSYvtBW6Qg9ae4FIDHP7ByJFdKSJfqwFfWiUojKIlHxCwD8a5U6yojKfAPftXWiAYIo9dQthCC16M3uxkIzaPrDET6-2yuHCX8gECd0LdbMC-qDi1LOxj4koqQbfsAGnhoQ6_SOgKdn3dQWaInAo1AUx7aMP5soJ0Xai1OKpykak3vgB4QK7.c9r
.
Version: 1.8.6
Short Description
Increase performance and compability of cryptofs by increasing the treshold before ciphertext filenames are shortend.
Description
To keep compability with certain OS (e.g. Windows), cryptofs shortens the name of ciphertext files if their base64 encoding exceeds a certain treshold. Currently the treshold is set to https://github.com/cryptomator/cryptofs/blob/851f44090db4b068c7ac7fe27adcecd4c32767e5/src/main/java/org/cryptomator/cryptofs/Constants.java#L18 .
If the treshold is increased, file name shortening appears less which makes cryptofs more robust to race conditions, increases the performance due to direct file access and makes it more compatible with certain cloud syncing software (Google BackUp & Sync).
The suggestion is to set it to 254 as the new value. It is computed the following way: