KentonWhite / ProjectTemplate

A template utility for R projects that provides a skeletal project.
http://projecttemplate.net
GNU General Public License v3.0
622 stars 159 forks source link

Consider using readxl package for xls and xlsx files to remove external dependency #156

Closed jeromyanglim closed 6 years ago

jeromyanglim commented 7 years ago

One issue that I've encountered when training others to use ProjectTemplate is that new users often want to import Excel files (xls and xlsx). Users on Windows almost never have a Perl installation (required by gdata). Of course, they can get one, but I wonder whether it would be worth considering using the readxl package that does not have external dependencies.

I adapted a function for my own purposes, but I'm not an expert in these reader functions and I can appreciate that you wouldn't want to break previous code (if they were to import data in slightly different ways. Anyway, I share my function is case it's relevant:

xls.reader <- function(data.file, filename, workbook.name)
{
     ProjectTemplate:::.require.package('readxl')

     sheets <- readxl::excel_sheets(filename)

     for (sheet.name in sheets) {
         variable.name <- paste(workbook.name, ProjectTemplate:::clean.variable.name(sheet.name), 
                                sep = ".")
         tryCatch(assign(variable.name,
                         readxl::read_excel(filename,
                                         sheet = sheet.name),
                         envir = ProjectTemplate:::.TargetEnv),
                  error = function(e)
                  {
                      warning(paste("The worksheet", sheet.name, "didn't load correctly."))
                  })
     }
}
connectedblue commented 7 years ago

Hi @jeromyanglim

I use windows and had no idea that perl was being installed. I import from spreadsheets all the time (although thinking about it, maybe they are CSV files not xls).

Looking at your code, it seems a small change that would not impact other reader types. I'd encourage you to fork the latest master, make the package change and submit a pull request. I'd be up for testing the change (I'm sure others will dive in too). I have some large dataset analysis projects which would be good test cases.

My main concern would be performance of this package compared to the gdata one, but otherwise it seems like a big improvement.

I'm new to the ProjectTemplate community myself. Best way to get started is to dive in and raise a PR. You'll get the kudos then of contributing a change which you can work into your training.

KentonWhite commented 7 years ago

Hi @jeromyanglim,

Updating the excel readers is definitely on our wish list. See issue #94!

Since you have done most of the work already the final step is opening a pull request and merging the changes into the project. If you have never contributed to open source before I'm very happy to help you with this and walk you through the steps. Github has a great tutorial here:

https://help.github.com/articles/creating-a-pull-request/

Thanks so much for you contribution :)

jeromyanglim commented 7 years ago

@KentonWhite @connectedblue Okay. I'm pretty new to pull requests, but I'll be able to work it out. I guess the main issue from my perspective would be whether readxl would break any previous code. But in general, I think that readxl is an excellent option because removing dependencies on java or Perl would make ProjecTemplate more accessible.

I'll give the pull request thing a go over the next few weeks once I've had a chance to more thoroughly check the robustness. Thanks for the welcome.

KentonWhite commented 7 years ago

@jeromyanglim We use the testthat test suite to ensure that new functionality doesn't break the package for other users. Since readxl should be a drop-in replacement in terms of functionality, all of the tests should pass. If the tests don't pass then testthat should give some feedback about what is broken.

You can run testthat locally on your machine or have TravisCI on GitHub test it for you! When you make a PR, the test suite will be automatically run and an email sent if the changes have broken the package.

jeromyanglim commented 7 years ago

@KentonWhite Okay. So I gave it a go.

https://github.com/johnmyleswhite/ProjectTemplate/pull/159

I mention a few of the issues in the pull request, mostly around backwards compatibility.

Hugovdberg commented 6 years ago

Closing this since #159 was merged