Closed pomalley08 closed 6 years ago
I'm not familiar tibbles, but it sounds like it should be possible. Would you be able to try adding the feature as a pull request?
Sure, I'll take a look at it.
A tibble is essentially a data.frame
with a nice print function associated with it. The print function makes sure only as much data is shown as actually fits within the console, and prints helpful info about what's left out.
I think this is a great addition! Should we perhaps not even make it optional, just apply tbl_df
on all variables? A data.table
really behaves differently from a normal data.frame
, a tibble
only looks different, and I don't think it negatively affects performance.
I think this is a great idea. Tibbles do have other advantages vs conventional dataframes. I would refer readers to http://r4ds.had.co.nz/tibbles.html
I looked into adding this feature, but it's a bit beyond my ability at this point. Someone else can try if they are interested.
@Hugovdberg Yes. I think making tibble
the default (and perhaps the only option) is not a bad idea, given where R is headed in terms of using the tidyverse
. Having said that, we can keep existing code under an option return_data_frame
to keep backward compatibility for existing code.
@rsangole I don't think we really need to keep an option if it doesn't conflict with data_tables
(although we could simply don't apply as_tibble
if data_tables
is TRUE
). I'd rather add it to version 0.9 and list the change in migrate_project
.
@pomalley08 I could add the functionality, but if you'd like I can give you a few pointers first if you would like to learn how to get involved?
Ok makes sense. @Hugovdberg between you or @pomalley08 , whoever makes the changes, could you put a few notes on where you made the change here? I'd like to learn a bit more on the code structure too.
@Hugovdberg I appreciate the offer, but unfortunately I don't have the time to commit to this right now.
@rsangole I just remembered that I would describe the changes I made to implement this. There are several links to the lines in load.project.R
to see which point in the code I'm referring to.
Some time ago I refactored the load.project
function a little so it is more self-explanatory. As you can see it's mainly a sequence of calls to load the configuration, libraries, logger, helpers, data, and finally munge the data.
This feature request was meant to convert the loaded data as tibble
s. My rationale was that it makes most sense to see this not as a munging step, because that loops over the munge scripts, but to see it as a part of the loading process.
It's easiest to implement it at that moment as well because then you are handling each dataset individually. After loading the data from file and before the data is cached there already was a check if the data should be converted to a data.table
. If not I added an else
branch to convert to tibble
instead, which is essentially a copy of the .convert.to.data.table
function.
Besides that a few tests had to be fixed because they assumed plain data.frame
s were used instead of tibble
s.
A detailed insight in the exact changes can also be found in the commit itself of course :)
Since no further requests for change were recieved I assume we can close this issue.
Is it possible to add an option to the configuration file to read in data frames as tibbles? I'm thinking this would be similar to the existing option to read in as a data.table. Or would this be best handled in a munging script?