KentonWhite / ProjectTemplate

A template utility for R projects that provides a skeletal project.
http://projecttemplate.net
GNU General Public License v3.0
623 stars 159 forks source link

allow to specify encoding in global.dcf #189

Closed eribul closed 6 years ago

eribul commented 7 years ago

Problem

I sometimes work with data sets containing variable names with non-ascii letters (åäöÅÄÖ). To handle this and to get scripts reusable over different platforms, I prefer to have my R-files saved with encoding UTF-8. It seems however that projectTemplate assumes the system default (I am currently working on Windows), i.e. ISO-8859-1. My non-ascii letters will then not be recognised and ProjectTemplate::load.project() will fail if I use these in a munge-file.

Current work-around

I now have to manually resave the R-file with the system defaults encoding (ISO-8859-1). (Mayby I could also add options("encoding" = "UTF-8") to a script in "lib"; did'nt actually try this).

Suggestion

It might be nice to allow a specification of encoding in the global.dcf-file to use within the project? I am not sure excactly how to do this but mayby one of:

KentonWhite commented 7 years ago

Hi @eribul sorry its taken me a while to respond -- you caught us Ontarians during our march break!

I like the idea of allowing a person to specify the encoding. I'm not sure how to do it either -- it might require adjusting many of the readers to respect the encoding. We also have to consider what to do if a person specifies an incorrect encoding.

Changing the global environment might be cleaner -- and it should work with all of the readers (if they respect the global option. That then leads to the question of should ProjectTemplate be changing global options or is that left to the user.

Can you do a quick experiment for me please. Can you change the global option using options("encoding") to "UTF-8" and see if ProjectTemplate works fine with that method.

Hugovdberg commented 7 years ago

I agree with @KentonWhite that setting the encoding globally through options is the cleanest if you want it for all files. The only case I can think of that we should implement an encoding option if it would be on a per-file basis, when mixing files with different encodings. But then I think the option should be set in a .file file instead of globals.dcf.

eribul commented 7 years ago

Sorry again for late response :) I have now tried to change the global option to UTF-8 but that does not seem to work unfourtenately.

KentonWhite commented 7 years ago

That's too bad. It would have been the cleanest solution.

Anyone have ideas of how we could better specify the encoding in ProjectTemplate?

Hugovdberg commented 7 years ago

I can see two solutions:

  1. Change the readers for text-mode data files to explicitly use the options('encoding'). That way at least the global option is respected.
  2. Add a ... argument to the signature of the readers, and use it to pass custom options from a .file file to a reader. This would allow users to overrule specific options for the given reader. This would perhaps require to give the .file extension the highest priority to ensure the given options are always respected. This could also be included in #187, as this makes an even stronger case for building a complete list of all files before loading them into memory.

Perhaps those two options can even be combined, using something along the lines of get0('encoding', ifnotfound = options('encoding')), such that the global option is used unless a local option is specified.

KentonWhite commented 6 years ago

Looks like this issue was addressed in #187. Closing — please re-open if this wasn't fixed!