Closed MerlijnWajer closed 3 years ago
Yes, I can confirm this problem. The replacement should not be done for Linux, MacOS and others which only use /
as path separator.
Question: are there valid path names in unix that have these double back-slashes? How would such a path, as you show above, be interpreted by the file system?
Also, are there valid path names in unix with a single back-slash?
And finally, is that a valid path name in Windows?
Question: are there valid path names in unix that have these double back-slashes? How would such a path, as you show above, be interpreted by the file system?
Yes, those exist, every backslash just needs to be escaped with another backslash from the shell, but the string (in bytes) would just contain one backslash for each backslash.
merlijn@gentoo-x230 ~ $ touch /tmp/test\\\\test2\\\\\\\\
merlijn@gentoo-x230 ~ $ ls -lsh /tmp/test\\\\test2\\\\\\\\
0 -rw-r--r-- 1 merlijn merlijn 0 Dec 10 23:18 '/tmp/test\\test2\\\\'
Quick test in Python (every backslash is also escaped with a backslash):
>>> s = '/tmp/test\\\\test2\\\\\\\\'
>>> s.count('\\')
6
>>> open(s, 'rb').read()
b''
The s.count('\\')
shows that six backslashes are in the path in total (the '\\'
is a single backslash). Two after test
, four after test2
. The open(s, 'rb')
call shows that the open succeeds.
Also, are there valid path names in unix with a single back-slash?
Yes, the example I showed actually had just a single back slash in the actual file name, it's just that I (and gdb) had to escape the back slashes for the string literals and in the shell itself.
EDIT: I think that \
has no special meaning in a filename on UNIX, and it's just treated as any other byte. You can even use newlines in UNIX filenames, as far as I know.
And finally, is that a valid path name in Windows?
My windows-fu isn't that strong, but I know that both /
and \
are not allowed in file names, but can both be part of a path in Windows. If a forward slash actually works outside of Windows APIs is something I do not know.
Thank you! From what you have said, on unix systems all backslashes must be preserved. (Which is what you said in your initial posting. I will fix this.
Thank you!
I can confirm that this solves the problem for me, by the way. Applied the patch it on top of Ubuntu 20.04's liblept5
and the problem in Tesseract is gone. So I guess the issue can be closed?
Thank you for confirming we're ok.
Noticed as a problem in Tesseract initially: https://github.com/tesseract-ocr/tesseract/issues/3178
It looks like
genPathname
does not like backslashes in filenames on UNIX, even though this is valid:It looks like
convertStepCharsInPath
is causing this problem, it's converting'\\'
to/
even on UNIX, which is not what it should be doing, as far as I can tell.