Rdatatable / data.table

R's data.table package extends data.frame:
http://r-datatable.com
Mozilla Public License 2.0
3.62k stars 985 forks source link

vignette() function is not interpreted #6612

Open ChristianWia opened 5 days ago

ChristianWia commented 5 days ago

Looking at -> https://rdatatable.gitlab.io/data.table/articles/datatable-intro.html

Concerning the bottom link, it is not interpreted:

"We will see how to add/update/delete columns by reference and how to combine them with i and by in the next vignette (vignette("datatable-reference-semantics", package="data.table"))."

I think a link is expected there instead of 'vignette' which appears not interpreted. (?)

Anirban166 commented 4 days ago

All of the links to the different vignettes in that document and other vignettes (changes come from here) are in the same format (ignoring just the parentheses for the last one, imo it feels a bit odd to have vignette("vignette-name", package="data.table") instead of just vignette-name in general)

Anirban166 commented 4 days ago

Is there some reason why relative links of the form [text to convey checking the vignette named vignette-name](vignette-name.html) do not work? (I just tried to edit the DOM for one such link in a vignette hosted on rdatatable.gitlab.io/data.table/articles/ via inspecting elements and I was able to navigate to the vignette I linked to of that form, and locally too if I were to build my vignettes, my filesystem seems to recognize that relative path in the standard inst/doc and renders the vignette when checking the .html file and clicking the link to the other vignette of that form)

tdhock commented 3 days ago

I think the non-interpreted vignette is a common pattern

grep  -nH --null 'vignette(' *
datatable-intro.Rmd:482:We'll learn more about `keys` in the `vignette("datatable-keys-fast-subset", package="data.table")`; for now, all you have to know is that you can use `keyby` to automatically order the result by the columns specified in `by`.
datatable-intro.Rmd:662:We can do much more in `i` by keying a `data.table`, which allows for blazing fast subsets and joins. We will see this in the `vignette("datatable-keys-fast-subset", package="data.table")` and the `vignette("datatable-joins", package="data.table")`.
datatable-intro.Rmd:696:We will see how to *add/update/delete* columns *by reference* and how to combine them with `i` and `by` in the next vignette (`vignette("datatable-reference-semantics", package="data.table")`).
datatable-joins.Rmd:29:- `vignette("datatable-intro", package="data.table")`
datatable-joins.Rmd:30:- `vignette("datatable-reference-semantics", package="data.table")`
datatable-joins.Rmd:31:- `vignette("datatable-keys-fast-subset", package="data.table")`
datatable-keys-fast-subset.Rmd:27:This vignette is aimed at those who are already familiar with *data.table* syntax, its general form, how to subset rows in `i`, select and compute on columns, add/modify/delete columns *by reference* in `j` and group by using `by`. If you're not familiar with these concepts, please read the `vignette("datatable-intro", package="data.table")` and the `vignette("datatable-reference-semantics", package="data.table")` first.
datatable-keys-fast-subset.Rmd:33:We will use the same `flights` data as in the `vignette("datatable-intro", package="data.table")`.
datatable-keys-fast-subset.Rmd:61:In the `vignette("datatable-intro", package="data.table")`, we saw how to subset rows in `i` using logical expressions, row numbers and using `order()`. In this section, we will look at another way of subsetting incredibly fast - using *keys*.
datatable-keys-fast-subset.Rmd:146:* Note that we did not have to assign the result back to a variable. This is because like the `:=` function we saw in the `vignette("datatable-reference-semantics", package="data.table")`, `setkey()` and `setkeyv()` modify the input *data.table* *by reference*. They return the result invisibly.
datatable-keys-fast-subset.Rmd:265:* Once we have the row indices, we look at `j` which requires only the `arr_delay` column. So we simply select the column `arr_delay` for those *row indices* in the exact same way as we have seen in `vignette("datatable-intro", package="data.table")`.
datatable-keys-fast-subset.Rmd:293:We have seen this example already in the `vignette("datatable-reference-semantics", package="data.table")`. Let's take a look at all the `hours` available in the `flights` *data.table*:
datatable-keys-fast-subset.Rmd:501:Key based subsets are **incredibly fast** and are particularly useful when the task involves *repeated subsetting*. But it may not be always desirable to set key and physically reorder the *data.table*. In the next `vignette("datatable-secondary-indices-and-auto-indexing", package="data.table")`, we will address this using a *new* feature -- *secondary indexes*.
datatable-reference-semantics.Rmd:26:This vignette discusses *data.table*'s reference semantics which allows to *add/update/delete* columns of a *data.table by reference*, and also combine them with `i` and `by`. It is aimed at those who are already familiar with *data.table* syntax, its general form, how to subset rows in `i`, select and compute on columns, and perform aggregations by group. If you're not familiar with these concepts, please read the `vignette("datatable-intro", package="data.table")` first.
datatable-reference-semantics.Rmd:32:We will use the same `flights` data as in the `vignette("datatable-intro", package="data.table")`.
datatable-reference-semantics.Rmd:172:* We can use `i` along with `:=` in `j` the very same way as we have already seen in the `vignette("datatable-intro", package="data.table")`.
datatable-reference-semantics.Rmd:237:* We could have also provided `by` with a *character vector* as we saw in the `vignette("datatable-intro", package="data.table")`, e.g., `by = c("origin", "dest")`.
datatable-reference-semantics.Rmd:256:* The `LHS := RHS` form allows us to operate on multiple columns. In the RHS, to compute the `max` on columns specified in `.SDcols`, we make use of the base function `lapply()` along with `.SD` in the same way as we have seen before in the `vignette("datatable-intro", package="data.table")`. It returns a list of two elements, containing the maximum value corresponding to `dep_delay` and `arr_delay` for each group.
datatable-reference-semantics.Rmd:372:* We have also seen how to use `:=` along with `i` and `by` the same way as we have seen in the `vignette("datatable-intro", package="data.table")`. We can in the same way use `keyby`, chain operations together, and pass expressions to `by` as well all in the same way. The syntax is *consistent*.
datatable-reference-semantics.Rmd:382:So far we have seen a whole lot in `j`, and how to combine it with `by` and little of `i`. Let's turn our attention back to `i` in the next vignette `vignette("datatable-keys-fast-subset", package="data.table")` to perform *blazing fast subsets* by *keying data.tables*.
datatable-sd-usage.Rmd:127:1. The `:=` is an assignment operator to update the `data.table` in place without making a copy. See `vignette("datatable-reference-semantics", package="data.table")` for more.
datatable-secondary-indices-and-auto-indexing.Rmd:33:We will use the same `flights` data as in the `vignette("datatable-intro", package="data.table")`.
datatable-secondary-indices-and-auto-indexing.Rmd:196:All the operations we will discuss below are no different to the ones we already saw in the `vignette("datatable-keys-fast-subset", package="data.table")`. Except we'll be using the `on` argument instead of setting keys.
datatable-secondary-indices-and-auto-indexing.Rmd:222:We have seen this example already in the `vignette("datatable-reference-semantics", package="data.table")` and the `vignette("datatable-keys-fast-subset", package="data.table")`. Let's take a look at all the `hours` available in the `flights` *data.table*:
datatable-secondary-indices-and-auto-indexing.Rmd:256:The other arguments including `mult` work exactly the same way as we saw in the `vignette("datatable-keys-fast-subset", package="data.table")`. The default value for `mult` is "all". We can choose, instead only the "first" or "last" matching rows should be returned.
datatable-secondary-indices-and-auto-indexing.Rmd:330:We will discuss fast *subsets* using keys and secondary indices to *joins* in the next vignette, `vignette("datatable-joins", package="data.table")`.
grep: fr: Is a directory
grep: plots: Is a directory

Grep exited abnormally with code 2 at Thu Nov 14 12:44:48

I guess it would be nice to convert to hyper-links, which the user could click on, instead of having to type the vignette command in R. if you want to implement hyper-links, I would consider a PR.

Anirban166 commented 2 days ago

Thanks for the grep, but one still has to manually check all vignettes as sometimes there are just references to other vignettes by name (italicized) or their title without the use of vignette() (or without a clickable link). I think it would be good to fix that inconsistency as well, so I'll send a PR today.

Anirban166 commented 2 days ago

https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/vignettes/datatable-intro.Rmd#L317

which vignette is this (data.table design) referring to?

ChristianWia commented 2 days ago

... not in the set of vignettes *.Rmd , probably not written yet or topic hidden in some paragraph section.

AngelFelizR commented 1 day ago

It seems odd to me, as the vignette() function is directing me to the correct site.

I believe it's important to maintain the vignette() syntax for links so that people can access those articles even without an internet connection.

If you prefer to use links, I think it would be better to change from vignettes to articles that only display on the web page to reduce the installation time for {data.table}.

I just checked the join vignette and the links are working.

Screenshot_2024-11-16-08-37-00-785_com.android.chrome.jpg