Open Safihre opened 1 year ago
@animetosho any chance we can get you interested in fixing this one? Then we could use par2cmdline-turbo also on Windows and don't need Multipar anymore.
If someone submits a pull request to par2cmdline, I can consider merging it into par2cmdline-turbo.
Otherwise, I have enough on my plate for now.
I intend to submit a PR for this; I've got UTF8 working including in the console it just needs tidying up.
I haven't really looked at the long paths issue yet, but from a quick glance I think there are numerous issues;
par2 file must not have a wildcard in it.
is because ?
is not allowed - but it's needed for \\?\
_stat
_MAX_PATH
has been defined but only used for renames, GetCanonicalPathname
uses MAX_PATH (260) but I haven't looked into the implications of that.@mnightingale Any updates on this work?
Sorry I have been busy with other things, the Unicode problem is much involved than expected, needs a lot of MultiByteToWideChar and convert the other way for outputting to the code page in use the console.
I have some changes that work, but it is very messy and would break non-Windows builds.
The long path problem appears a simpler fix at least for verify, create needs more work on how it splits a path into components and the MAX_PATH it used for buffers.
Is there anything we can learn from nzbget handling of this? https://github.com/nzbgetcom/nzbget/blob/develop/daemon/util/FileSystem.cpp
Looks like their might be interest from their to fix it (mentioned issue), they certainly have more experienced C++ programmers than me 😄
But yes nzbget handles some of what needs implementing here, see helpers UtfPathToWidePath
and WidePathToUtfPath
along with handling long paths as required it looks like there are several string helpers to make the manipulation easier.
How MultiPar handles it may also be useful to someone else looking https://github.com/Yutaka-Sawada/MultiPar/blob/d733ada21ae81405de468dd2cd458bcbf09ab9ea/source/par2j/common2.c#L66-L204 specifically the parts for converting to the code page used by the console and handling the possible errors (try again with dwFlags = 0).
edit: just to mention the other part which nzbget doesn't have to worry about is every filename including from par packets, needs preparing for the console when printing log/debug lines, so you don't get "?" or broken characters.
@Safihre I've cleaned up what I'd started working on https://github.com/mnightingale/par2cmdline/tree/unicode_filenames it won't compile outside of Windows currently and it only attempts to addresses the problems with UTF8 filenames.
I haven't made a draft PR for now as I don't want to put someone else having a go.
edit: main thing is figuring out how to remove TCHAR, at the time just wanted to get it to compile.
I think it's about there now, I've added support for \\?\
paths, an improvement could be made to always use such paths even if the path passed in isn't but I wasn't going to bother with that.
utf8::compatible
tries to deal with normalisation of filenames in packets on Windows.
I'm not sure how to do similar for other platforms, may need a separate library or just not bother so both are broken.
Changes can be applied applied with minimal changes to par2cmdline-turbo
@mnightingale We only have to care for Windows, on the other platforms there are no problems. The SABnzbd unit-tests already show that :) So all things can be wrapped in macros to only build on Windows platforms!
utf8::compatible
was specifically to deal with test_win_unicode
https://github.com/sabnzbd/sabnzbd/issues/1633 which as far as I can tell fails on all platforms.
Test file has:
00000070 84 00 00 00 00 00 00 00 66 72 e8 6e 63 68 5f 67 |........fr.nch_g|
00000080 65 72 6d 61 6e 5f 64 65 6d f6 2e 72 61 72 00 00 |erman_dem..rar..|
But generating new recovery volumns with par2cmdline uses unicode, against the spec but I don't think it's the only implementation to do as such.
000000d0 00 00 00 00 66 72 c3 a8 6e 63 68 5f 67 65 72 6d |....fr..nch_germ|
000000e0 61 6e 5f 64 65 6d c3 b6 2e 72 61 72 50 41 52 32 |an_dem...rarPAR2|
I think I'll just remove the part trying to handle this, then it will behave the same broken way on all platforms.
As requested by @animetosho, reporting the original bug here 💯
On Windows, par2cmdline can basically only handle ASCII filenames. This was fine when par2cmdline was created long ago, but the world is UTF8 now. Simply try to open a file named
ä½ å¥½ä¸–ç•Œ.par2
, will fail.This was fixed long ago in
par2-tbb
, but that fork has been unmaintained for a long long time. It also displays any UTF8-charahters as question marks in the console output.Additionally, par2cmdline does not seem to really support long-path notation on Windows. Which is needed to open files where the path exceeds 255 characters. For example
\\\\?\\C:\\test_win_unicode\\test_win_unicode.par2
results also in:We use that by default so we can avoid any problems on longer paths and having to check path-lengths every time.
animetosho/par2cmdline-turbo/issues/13 animetosho/par2cmdline-turbo/issues/17 sabnzbd/sabnzbd/pull/2674