DorisAmoakohene / Researchwork_Rdata.table

0 stars 0 forks source link

Issue 5366 plotting is giving me a linear graph #2

Closed DorisAmoakohene closed 12 months ago

DorisAmoakohene commented 1 year ago

@tdhock i am trying to plot the atime version, to check The before Regression, Regression and Fixed of the above issues but my graph is giving me a linear which seems not to be accurate

This is the link to the codes I am reproducing https://github.com/Rdatatable/data.table/pull/5463

These are the codes I am running #5366

atime.list.2 <- atime::atime_versions(
pkg.path=tdir,
pkg.edit.fun=function(old.Package, new.Package, sha, new.pkg.path){
      pkg_find_replace <- function(glob, FIND, REPLACE){
        atime::glob_find_replace(file.path(new.pkg.path, glob), FIND, REPLACE)
      }
      Package_regex <- gsub(".", "_?", old.Package, fixed=TRUE)
      Package_ <- gsub(".", "_", old.Package, fixed=TRUE)
      new.Package_ <- paste0(Package_, "_", sha)
      pkg_find_replace(
        "DESCRIPTION", 
        paste0("Package:\\s+", old.Package),
        paste("Package:", new.Package))
      pkg_find_replace(
        file.path("src","Makevars.*in"),
        Package_regex,
        new.Package_)
      pkg_find_replace(
        file.path("R", "onLoad.R"),
        Package_regex,
        new.Package_)
      pkg_find_replace(
        file.path("R", "onLoad.R"),
        sprintf('packageVersion\\("%s"\\)', old.Package),
        sprintf('packageVersion\\("%s"\\)', new.Package))
      pkg_find_replace(
        file.path("src", "init.c"),
        paste0("R_init_", Package_regex),
        paste0("R_init_", gsub("[.]", "_", new.Package_)))
      pkg_find_replace(
        "NAMESPACE",
        sprintf('useDynLib\\("?%s"?', Package_regex),
        paste0('useDynLib(', new.Package_))
    },
  N=10^seq(2,10),
  setup={ 
    set.seed(1L)
    dt <- data.table(
      id = seq_len(N),
      val = rnorm(N))
  },
  expr=data.table:::`[.data.table`(dt[, .(vs = (sum(val))), by = .(id)]),
  "Before"="be2f72e6f5c90622fe72e1c315ca05769a9dc854",
  "Regression"="e793f53466d99f86e70fc2611b708ae8c601a451", #Before and Regression:https://github.com/Rdatatable/data.table/pull/4491/commits from this commits id in github. on(news items tweak and move items up)
  "Fixed"="58409197426ced4714af842650b0cc3b9e2cb842") #fixed:#https://github.com/Rdatatable/data.table/pull/5463/commits; taken from the last commits in here. for fixed.
this is the code I used for my set up
library(data.table)
set.seed(123L)
n = 1e4L
dt = data.table(id = seq_len(n),
                val = rnorm(n))

system.time(
  dt[, .(vs = (sum(val))), by = .(id)]

Screenshot 2023-11-10 180505

tdhock commented 1 year ago

Hi! I see your issue is that you expected Regression to be different from Fixed/Before curves to be different, but it seems to be overlapping.

It is normal that the curves are overlapping, when your expr does not do significant computations that depend on the package version.

In atime expr you must change DT[something] to

data.table:::`[.data.table`(DT, something)

This is the function that is executed by R, when you do DT[something]. atime works by changing data.table::: to data.table.SHAVERSION:::

Your code is

data.table:::`[.data.table`(DT[SOMETHING])

which is an incorrect translation, because it uses the square brackets FIRST_RESULT = DT[SOMETHING] directly (using the data.table package, no version change), then it does

data.table:::`[.data.table`(FIRST_RESULT)

(which will be modified to use the data.table.SHAVERSION package, version that you specified, but it does not do anything because there are no arguments other than FIRST_RESULT). I feel like I have already explained this to you several times, including the modification I made last week, https://github.com/tdhock/atime/commit/a0d8b6d25a1db33fff7bab297e2755a57e25be4b#diff-d4f0871b2cf4bc99d1c9e453d1abab96d3863ad2dd27f1edbfdf5260d9d540e4R26-R29 Can you please suggest how to improve the documentation to make this more clear?

tdhock commented 1 year ago

To debug your code, and understand what atime is doing, I would recommend that you use atime_versions_exprs which will show you the modified code (with data.table.SHAVERSION::). For example try running the code below,

expr.list <- atime::atime_versions_exprs(
  pkg.path="~/R/data.table",
  pkg.edit.fun=function(old.Package, new.Package, sha, new.pkg.path){
    pkg_find_replace <- function(glob, FIND, REPLACE){
      atime::glob_find_replace(file.path(new.pkg.path, glob), FIND, REPLACE)
    }
    Package_regex <- gsub(".", "_?", old.Package, fixed=TRUE)
    Package_ <- gsub(".", "_", old.Package, fixed=TRUE)
    new.Package_ <- paste0(Package_, "_", sha)
    pkg_find_replace(
      "DESCRIPTION", 
      paste0("Package:\\s+", old.Package),
      paste("Package:", new.Package))
    pkg_find_replace(
      file.path("src","Makevars.*in"),
      Package_regex,
      new.Package_)
    pkg_find_replace(
      file.path("R", "onLoad.R"),
      Package_regex,
      new.Package_)
    pkg_find_replace(
      file.path("R", "onLoad.R"),
      sprintf('packageVersion\\("%s"\\)', old.Package),
      sprintf('packageVersion\\("%s"\\)', new.Package))
    pkg_find_replace(
      file.path("src", "init.c"),
      paste0("R_init_", Package_regex),
      paste0("R_init_", gsub("[.]", "_", new.Package_)))
    pkg_find_replace(
      "NAMESPACE",
      sprintf('useDynLib\\("?%s"?', Package_regex),
      paste0('useDynLib(', new.Package_))
  },
  expr=data.table:::`[.data.table`(dt[, .(vs = (sum(val))), by = .(id)]),
  "Before"="be2f72e6f5c90622fe72e1c315ca05769a9dc854",
  "Regression"="e793f53466d99f86e70fc2611b708ae8c601a451", #Before and Regression:https://github.com/Rdatatable/data.table/pull/4491/commits from this commits id in github. on(news items tweak and move items up)
  "Fixed"="58409197426ced4714af842650b0cc3b9e2cb842") #fixed:#https://github.com/Rdatatable/data.table/pull/5463/commits; taken from the last commits in here. for fixed.

The result I got is shown below:

> expr.list
$Before
data.table.be2f72e6f5c90622fe72e1c315ca05769a9dc854:::`[.data.table`(dt[, 
    .(vs = (sum(val))), by = .(id)])

$Regression
data.table.e793f53466d99f86e70fc2611b708ae8c601a451:::`[.data.table`(dt[, 
    .(vs = (sum(val))), by = .(id)])

$Fixed
data.table.58409197426ced4714af842650b0cc3b9e2cb842:::`[.data.table`(dt[, 
    .(vs = (sum(val))), by = .(id)])

In the result above, you can see that the version is used, but not in the right place. You need to remove the square brackets, as in the code below,

expr=data.table:::`[.data.table`(dt, , .(vs = (sum(val))), by = .(id)),

In the code above,

That gives the result shown below:

> expr.list
$Before
data.table.be2f72e6f5c90622fe72e1c315ca05769a9dc854:::`[.data.table`(dt, 
    , .(vs = (sum(val))), by = .(id))

$Regression
data.table.e793f53466d99f86e70fc2611b708ae8c601a451:::`[.data.table`(dt, 
    , .(vs = (sum(val))), by = .(id))

$Fixed
data.table.58409197426ced4714af842650b0cc3b9e2cb842:::`[.data.table`(dt, 
    , .(vs = (sum(val))), by = .(id))