Closed Robinlovelace closed 5 years ago
Any input on this approach especially welcome, heads-up @mpadge and @layik
Disclaimer: I've yet got any sufficient idea what this package is doing to be offering any kind of informed opinion here.
But and nevertheless. My one thought would be to be careful doing this. It is surprisingly easy to coerce complex lists (of lists of lists) into simple (-looking) data frame columns, but this comes at the single and very important expense of computational efficiency. These only look simple because of the default print
method, which compacts those list items (and/or equivalent for tibble
objects). However, any operations on those columns have to unpack the (potentially nested) lists, and that is still a very inefficient operation. This is what makes lots of sf
operations quite slow, and is the reason why sf
needed and now has a wealth of carefully hand-crafted geometric operations. These are all done in C++, in which this kind of unpacking, dissembling, re-assemling is reasonably efficient, but in R it remains strikingly inefficient.
Jim Hester gave a great talk about his glue
package at useR, half of which was about efficiency and the ease of sticking glue
operations in any pipeline. We should definitely have the same mindset here, and giving users the impression that columns are neat and simple when in reality they are actually nested lists is generally highly inefficient. Probable appropriate summary here would be standard tidyverse evangelism. Each column is one variable; each row one observation. Very easy to achieve optimal computational efficiency in that case.
Computational efficiency is not really a concern because ical files tend to be tiny. And if we were to want to optimise it having found some large ical files, e.g. due to huge ical files spewed by gtfs feeds, then we should do that after the functionality is there, according to this quote from Donald Knuth:
“premature optimization is the root of all evil (or at least most of it) in programming” (Knuth 1974).
But any suggestions on how to get the functionality working v. welcome. Thinking one line per event to coincide with the default output of ic_list()
.
But definitely in favour of making our data 'tidy' as defined as data frames in which:
Each variable forms a column.
Each observation forms a row.
Each type of observational unit forms a table.
Source: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html
Starting with events as rows it would be hard not to follow this definition - eminently sensible to consider though and any other thoughts v. welcome.
I think going from list -> datframe makes sense but cannot be sure.
This answer outlines how to coerce that into a dataframe (I suggest we do this after filtering out the properties we need such as
VEVENT
: we no know longer need that because we know they're events already - could have writtenic_list()
to omit those lines but better to omit them explitly in the next function, e.g. calledic_daatframe()
(benefit: more explicit thatic_df()
): https://stackoverflow.com/questions/15201305/how-to-convert-a-list-consisting-of-vector-of-different-lengths-to-a-usable-data