Closed GoogleCodeExporter closed 9 years ago
[deleted comment]
Patch for v2.0 branch (branches/v2.0/Duplicati/CommandLine/Program.cs rev 1039)
Original comment by nicolas.hatier
on 6 Jan 2013 at 2:46
Attachments:
Original comment by rst...@gmail.com
on 6 Jan 2013 at 11:48
I like this patch, but I usually require stuff to be in utf-8 (or have a BOM).
I find that for US people, this works fine because for ASCII stuff, utf-8 is
the same, and for western languages, UTF-8 is generally close to optimal. For
non-western, utf-8 works as well, although with a size overhead.
I would prefer if we just go for utf-8 and throw some sane error if it does not
work. Any reasons I am wrong here?
I would also remove the check for existing file, as the attempt to open the
file will catch this too.
Btw, would you like commit acces, so you can commit things into a sandbox and
merge them into trunk instead of sending patch files? If yes, send me a private
mail, and I will hook you up.
Original comment by kenneth@hexad.dk
on 22 Jan 2013 at 12:07
Well I didn't overthink this, I just thought my code would support what the
Stream object would support, then I discovered it was defaulting to utf-8 when
there was no BOM so I wrote code to detect the file type.
We can go for utf-8 only, but if someone throws an ascii file containing
accents or non-ascii characters he will end up with a "directory not found"
error somewhere, with a funny character in the name.
IMHO the effort to detect errors is the same as fully supporting ascii files in
the machine's default encoding, so why not supporting them? Supporting full
unicode files is even simpler.
Original comment by nicolas.hatier
on 22 Jan 2013 at 2:55
From what I can see, the same will happen, if the file is ASCII, as the UTF-8
detection will not throw errors, because the lower part of UTF-8 is ASCII
compatible, and thus the file will be read as UTF-8 in both cases. Sadly, the
lack of a BOM means that there is no default encoding choice that works, it
could be "system default" or an encoder that did not output a BOM.
If it makes sense to assume "machine encoding" if there is no BOM, then the
stream reader supports that:
http://msdn.microsoft.com/en-us/library/ms143457.aspx
Basically:
using(var sr = new System.IO.StreamReader(filename, Encoding.Default, true)) {
... }
That will use the BOM if found, and default to system (ASCII, ANSI, ...) if
there is no marker.
Should we do that?
Original comment by kenneth@hexad.dk
on 22 Jan 2013 at 3:10
[deleted comment]
And what about UTF-8 files without BOM?
My code detects UTF8 files with and without BOM, as well as ASCII ones, as you
can read a full ASCII file with Encoding.UTF8. The detection code will however
kick on non-ascii 8-bit encodings, such as the ISO-8859-1 a large part (me
included) of the world use. When it encounters invalid utf-8 bytes it will
default to the machine default encoding, which, by any luck, should be the
encoding the user uses.
I think the only other way is to say the support is for utf-8 with BOM only,
and we manually check the first three bytes of the file.
Original comment by nicolas.hatier
on 22 Jan 2013 at 3:22
Ok, we agree that files with no BOM are hard to read.
I am not sure that your method is better than the simple method, but it is too
little a point to spend more time on.
I have moved the method to read with your encoding detection into the Utility
module so it can be reused elsewhere.
The reason that the --source and --target stuff is not parameters is a legacy
from the duplicity CLI that I attempted to be compatible with in the beginning.
Original comment by kenneth@hexad.dk
on 22 Jan 2013 at 4:03
This issue was updated by revision r1511.
Refactored the patch from issue #766
Moved reading into a function in the utility module.
Moved parsing of file data into a function.
Added parameters to the command line client, such that the CLI may have options.
Implemented these options in the help output, so the --parameters-file can be
viewed from the help.
Original comment by kenneth@hexad.dk
on 22 Jan 2013 at 4:12
Completed and released with 1.3.4
Original comment by kenneth@hexad.dk
on 2 Feb 2013 at 8:16
Original issue reported on code.google.com by
nicolas.hatier
on 6 Jan 2013 at 1:41Attachments: