Open arunsrinivasan opened 9 years ago
@zeomal Hopefully I will be able to upload the first draft soon, so you can have a look at it. In my draft, I provide a simple example of a "normal" join on a single variable, time, where there are non-matching rows. I use nomatch = NA
. (maaaybe also a quick example with nomatch = NULL
)
My idea was that this simple join could provide a context and a feeling for the problem, which I then treat more thoroughly in the following sections on rolling and non-equi joins et al.
Thanks a lot for your willingness to contribute! .
I have a question on joining by reference, while preparing the vignettes. The X[Y, new_col := old_col]
performs something similar to a traditional left join on X
. However, if there are multiple matches to Y
's keys in X
, only the last (or first?) matching value of the key is retained. Is this explicitly documented somewhere? I had tried searching for this back when I encountered it, but had to resort to my understanding of updating by reference for the reason. For a reproducible example,
> X = data.table(a = c(1, 2, 3), m = c("a", "b", "c"))
> Y = data.table(b = c(1, 1, 4), n = c("x", "y", "z"))
> X[Y, new_col := i.n, on = "a == b"]
a m new_col
1: 1 a y
2: 2 b <NA>
3: 3 c <NA>
# an ideal left join - expected behaviour per a new user, given below
# not possible because updating row by reference isn't implemented
a m new_col
1: 1 a x
1: 1 a y
2: 2 b <NA>
3: 3 c <NA>
This is expected behaviour, but isn't exactly straightforward for a new user. mult
does not impact the output either. Any suggestions on how I document this? Add merge
as a workaround for a proper left join?
@zeomal please post your future question about join vignette in #2181 issue instead. It seems to better place. It is documented in set
.
@zeomal If you wish to check how brief my treatment on normal (equi) joins is, I just want to let you know that I posted a PR on a timeseries vignette.
Minor typo in vignettes/datatable-reshape.Rmd
lines 113 and 129: DT.m
should be replaced with DT.m1
.
@kjytay could you please file a PR fixing that? you should be able to do so in the GitHub UI so it should be pretty quick
Ok done!
HTML vignette series:
Planned for
v1.9.8
i.col
usage as filed in #1038. d) Also cover about performance/advantages from #1232.[ ] Covercovered in #4304get()
andmget()
. E.g., http://stackoverflow.com/q/33785747/559784Future releases
fread
+rbindlist
), ordering, ranking and set operationsdata.table()
anddata.frame()
somewhere - relevant issues: #968, #877. Perhaps slightly more in detail in the FAQ.data.table
usage:fread+fwrite
vignette, include also Convenience features of fread wiki, also https://github.com/Rdatatable/data.table/issues/2855Finished:
i
, select / do inj
and aggregations usingby
.i
andby
in the same way as before)by=.EACHI
until the vignette is done.Minor:
integer64
, and promoting it for large integers.Notes (to update current vignettes based on feedbacks): Please let me know if I missed anything..
Introduction to data.table:
order
ini
.j
while selecting/computing..SDcols
and cols inwith=FALSE
being able to select columns ascolA:colB
.Reference semantics:
set*
functions here.. (setnames
,setcolorder
etc..)set
.1b) the := operator
is just defining ways to use it - the example there doesn't work as it just shows two different ways of using it -- Following this comment.Keys and fast binary search based subsets:
FAQ (most appropriate here, I think).
readRDS()
. Update this SO post.alloc.col()
, and when to use it (when you need to create multiple columns), and why. Update this SO post.