Rdatatable / data.table

R's data.table package extends data.frame:
http://r-datatable.com
Mozilla Public License 2.0
3.6k stars 982 forks source link

Vignettes #944

Open arunsrinivasan opened 9 years ago

arunsrinivasan commented 9 years ago

HTML vignette series:

Planned for v1.9.8


Future releases


Finished:


Minor:


Notes (to update current vignettes based on feedbacks): Please let me know if I missed anything..

Introduction to data.table:

Henrik-P commented 4 years ago

@zeomal Hopefully I will be able to upload the first draft soon, so you can have a look at it. In my draft, I provide a simple example of a "normal" join on a single variable, time, where there are non-matching rows. I use nomatch = NA. (maaaybe also a quick example with nomatch = NULL)

My idea was that this simple join could provide a context and a feeling for the problem, which I then treat more thoroughly in the following sections on rolling and non-equi joins et al.

Thanks a lot for your willingness to contribute! .

zeomal commented 4 years ago

I have a question on joining by reference, while preparing the vignettes. The X[Y, new_col := old_col] performs something similar to a traditional left join on X. However, if there are multiple matches to Y's keys in X, only the last (or first?) matching value of the key is retained. Is this explicitly documented somewhere? I had tried searching for this back when I encountered it, but had to resort to my understanding of updating by reference for the reason. For a reproducible example,

> X = data.table(a = c(1, 2, 3), m = c("a", "b", "c"))
> Y = data.table(b = c(1, 1, 4), n = c("x", "y", "z"))
> X[Y, new_col := i.n, on = "a == b"]
   a m new_col
1: 1 a       y
2: 2 b    <NA>
3: 3 c    <NA>

# an ideal left join - expected behaviour per a new user, given below
# not possible because updating row by reference isn't implemented
   a m new_col
1: 1 a       x
1: 1 a       y
2: 2 b    <NA>
3: 3 c    <NA>

This is expected behaviour, but isn't exactly straightforward for a new user. mult does not impact the output either. Any suggestions on how I document this? Add merge as a workaround for a proper left join?

jangorecki commented 4 years ago

@zeomal please post your future question about join vignette in #2181 issue instead. It seems to better place. It is documented in set.

Henrik-P commented 4 years ago

@zeomal If you wish to check how brief my treatment on normal (equi) joins is, I just want to let you know that I posted a PR on a timeseries vignette.

kjytay commented 3 years ago

Minor typo in vignettes/datatable-reshape.Rmd lines 113 and 129: DT.m should be replaced with DT.m1.

MichaelChirico commented 3 years ago

@kjytay could you please file a PR fixing that? you should be able to do so in the GitHub UI so it should be pretty quick

kjytay commented 3 years ago

Ok done!