Rdatatable / data.table

R's data.table package extends data.frame:
http://r-datatable.com
Mozilla Public License 2.0
3.62k stars 985 forks source link

Optionally ignore indices in all.equal #6134

Open mb706 opened 6 months ago

mb706 commented 6 months ago

Different tables that contain the same data and also behave pretty much the same will not be all.equal() when they have different indices. Indices can sometimes appear automatically depending on what operations are performed with tables, so I think it can often be reasonable to ignore differences in indices.

x <- data.table(a = 1)
y <- data.table(a = 1)
all.equal(x, y)
#> [1] TRUE
x[a == 1]
#>        a
#>    <num>
#> 1:     1
all.equal(x, y)  # would be nice to have a way to get 'TRUE' here
#> [1] "Datasets have different indices. 'target': [a]. 'current': has no index."

While one can set check.attributes = FALSE, this also ignores other more important things like column names

all.equal(x, y, check.attributes = FALSE)
#> [1] TRUE
all.equal(x, data.table(b = 1), check.attributes = FALSE)  # this is too lenient
#> [1] TRUE

The user can set / unset indices in tables before checking with all.equals(), but that gets unnecessarily complicated when the tables are buried inside larger objects.

It would therefore be useful to have an option check.indices or ignore.indices that makes all.equal() ignore data.table indices.

My use case is that I use all.equal.data.table a lot in unit tests, where I verify that objects created in different ways still contain the same data. Tables may have different indices depending on how objects are created, but because these indices make no relevant difference in object behaviour I would like to ignore them.

tdhock commented 6 months ago

a work-around you can use with current code is to just copy the data tables, data.table(your_dt_with_index) which will remove the index I believe. this functionality seems reasonable to me, so if you could draft a PR, it would be appreciated.