KentonWhite / ProjectTemplate

A template utility for R projects that provides a skeletal project.
http://projecttemplate.net
GNU General Public License v3.0
623 stars 159 forks source link

Add Tibble configuration #215

Closed pomalley08 closed 6 years ago

pomalley08 commented 6 years ago

Is it possible to add an option to the configuration file to read in data frames as tibbles? I'm thinking this would be similar to the existing option to read in as a data.table. Or would this be best handled in a munging script?

KentonWhite commented 6 years ago

I'm not familiar tibbles, but it sounds like it should be possible. Would you be able to try adding the feature as a pull request?

pomalley08 commented 6 years ago

Sure, I'll take a look at it.

Hugovdberg commented 6 years ago

A tibble is essentially a data.frame with a nice print function associated with it. The print function makes sure only as much data is shown as actually fits within the console, and prints helpful info about what's left out. I think this is a great addition! Should we perhaps not even make it optional, just apply tbl_df on all variables? A data.table really behaves differently from a normal data.frame, a tibble only looks different, and I don't think it negatively affects performance.

rsangole commented 6 years ago

I think this is a great idea. Tibbles do have other advantages vs conventional dataframes. I would refer readers to http://r4ds.had.co.nz/tibbles.html

pomalley08 commented 6 years ago

I looked into adding this feature, but it's a bit beyond my ability at this point. Someone else can try if they are interested.

rsangole commented 6 years ago

@Hugovdberg Yes. I think making tibble the default (and perhaps the only option) is not a bad idea, given where R is headed in terms of using the tidyverse. Having said that, we can keep existing code under an option return_data_frame to keep backward compatibility for existing code.

Hugovdberg commented 6 years ago

@rsangole I don't think we really need to keep an option if it doesn't conflict with data_tables (although we could simply don't apply as_tibble if data_tables is TRUE). I'd rather add it to version 0.9 and list the change in migrate_project. @pomalley08 I could add the functionality, but if you'd like I can give you a few pointers first if you would like to learn how to get involved?

rsangole commented 6 years ago

Ok makes sense. @Hugovdberg between you or @pomalley08 , whoever makes the changes, could you put a few notes on where you made the change here? I'd like to learn a bit more on the code structure too.

pomalley08 commented 6 years ago

@Hugovdberg I appreciate the offer, but unfortunately I don't have the time to commit to this right now.

Hugovdberg commented 6 years ago

@rsangole I just remembered that I would describe the changes I made to implement this. There are several links to the lines in load.project.R to see which point in the code I'm referring to.

Some time ago I refactored the load.project function a little so it is more self-explanatory. As you can see it's mainly a sequence of calls to load the configuration, libraries, logger, helpers, data, and finally munge the data.

This feature request was meant to convert the loaded data as tibbles. My rationale was that it makes most sense to see this not as a munging step, because that loops over the munge scripts, but to see it as a part of the loading process.

It's easiest to implement it at that moment as well because then you are handling each dataset individually. After loading the data from file and before the data is cached there already was a check if the data should be converted to a data.table. If not I added an else branch to convert to tibble instead, which is essentially a copy of the .convert.to.data.table function.

Besides that a few tests had to be fixed because they assumed plain data.frames were used instead of tibbles.

A detailed insight in the exact changes can also be found in the commit itself of course :)

Hugovdberg commented 6 years ago

Since no further requests for change were recieved I assume we can close this issue.