Closed eribul closed 6 years ago
Hi @eribul sorry its taken me a while to respond -- you caught us Ontarians during our march break!
I like the idea of allowing a person to specify the encoding. I'm not sure how to do it either -- it might require adjusting many of the readers to respect the encoding. We also have to consider what to do if a person specifies an incorrect encoding.
Changing the global environment might be cleaner -- and it should work with all of the readers (if they respect the global option. That then leads to the question of should ProjectTemplate be changing global options or is that left to the user.
Can you do a quick experiment for me please. Can you change the global option using options("encoding") to "UTF-8" and see if ProjectTemplate works fine with that method.
I agree with @KentonWhite that setting the encoding globally through options
is the cleanest if you want it for all files. The only case I can think of that we should implement an encoding option if it would be on a per-file basis, when mixing files with different encodings. But then I think the option should be set in a .file
file instead of globals.dcf
.
Sorry again for late response :) I have now tried to change the global option to UTF-8 but that does not seem to work unfourtenately.
That's too bad. It would have been the cleanest solution.
Anyone have ideas of how we could better specify the encoding in ProjectTemplate?
I can see two solutions:
options('encoding')
. That way at least the global option is respected....
argument to the signature of the readers, and use it to pass custom options from a .file
file to a reader. This would allow users to overrule specific options for the given reader. This would perhaps require to give the .file
extension the highest priority to ensure the given options are always respected. This could also be included in #187, as this makes an even stronger case for building a complete list of all files before loading them into memory.Perhaps those two options can even be combined, using something along the lines of get0('encoding', ifnotfound = options('encoding'))
, such that the global option is used unless a local option is specified.
Looks like this issue was addressed in #187. Closing — please re-open if this wasn't fixed!
Problem
I sometimes work with data sets containing variable names with non-ascii letters (åäöÅÄÖ). To handle this and to get scripts reusable over different platforms, I prefer to have my R-files saved with encoding UTF-8. It seems however that
projectTemplate
assumes the system default (I am currently working on Windows), i.e.ISO-8859-1
. My non-ascii letters will then not be recognised andProjectTemplate::load.project()
will fail if I use these in a munge-file.Current work-around
I now have to manually resave the R-file with the system defaults encoding (
ISO-8859-1
). (Mayby I could also addoptions("encoding" = "UTF-8")
to a script in "lib"; did'nt actually try this).Suggestion
It might be nice to allow a specification of encoding in the
global.dcf
-file to use within the project? I am not sure excactly how to do this but mayby one of:encoding
-argument (based on a config variable) to thesource
-function when sourcing files.options("encoding")
fromnative.enc
to whatewer specified.