Closed tleedjarv closed 9 months ago
I have called for testing on unison-users@ and will merge in 7 days if no adverse comments.
I've tested unison-7f147080.ocaml-4.14.0+mingw64c.windows-2022\unison.exe on my Windows 11 22H2 (both when chcp 932
and when chcp 65001
). It looks like working without any problem after creating/deleting a あああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああ.txt
.
Thank you for testing. Syncing functionality is not affected by this patch. Only the truncated display of root names as "left" and "right" headings is affected. Realistically, the potential encoding issue could only manifest in the text UI because the same UTF-8 conversions have always been done in the GUIs; this patch shouldn't change anything there.
Ah, sorry. I tried again, then found it looked good:
> .\unison C:\t\ああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああ C:\t\いいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいい
Unison 2.53.3 (ocaml 4.14.0): Contacting server...
Looking for changes
Reconciling changes
The root of one of the replicas has been completely emptied.
Unison may delete everything in the other replica. (Set the
'confirmbigdel' preference to false to disable this check.)
Do you really want to proceed? []
Unrecognized command ' ': try again [type '?' or F1 for help]
Do you really want to proceed? [] y
あああああああああ... いいいいいいいいい...
<---- deleted 新規 テキスト ドキュメント.txt [f] ,
Thank you for testing! Is this with UTF-8 or some other encoding?
Sorry again! I tried when chcp 932
, and confirmed the result is same with my last comment (UTF-8: chcp 65001).
Sorry again! I tried when
chcp 932
, and confirmed the result is same with my last comment (UTF-8: chcp 65001).
I suspect that this will always work properly in Windows, with NTFS at least. It may break (meaning display will be garbled) with FAT when the dirnames are knowingly created in codepage other than UTF-8. But I don't know how that works in Windows.
I think the only real risk case is some Unix/Linux when dirnames on disk are encoding-less blobs (as seen by the OS and the filesystem) and those blobs are neither valid UTF-8 nor Latin 1 (ASCII is of course a subset of both). (As a side note: When it comes to multi-byte encodings, the display would have always been garbled, no regression there.)
Thank you for testing! At least we know nothing's immediately broken, and even then, any eventual breakage is going to be limited and cosmetic only.
The names of roots can be very long and are truncated in UI in the list of updates. Truncating by number of bytes does not play well with UTF-8 input. This patch makes the truncation safe on UTF-8 encoded strings while keeping the length of truncated strings roughly as before (and the result should also look correct for Western languages, maybe also for other scripts).
There is one tiny caveat. Since this patch applies to both TUI and GUI (also macOS native GUI), this may have a side-effect of scrambling any non-ISO-8859-1 (or Windows-1252 in GUI) and non-UTF-8 encodings, if such would ever be used. There is no change for the GTK GUI, as it has always expected either ISO-8859-1/Windows-1252 or UTF-8. I don't know if this will actually be a problem because I don't know if Unison has ever worked properly with these encodings. This caveat only applies to the root names, not to file/dir names. This is a display-only issue, it will not break any syncs!
This is a partial fix for #959.