Bioconductor / IRanges

Foundation of integer range manipulation in Bioconductor
https://bioconductor.org/packages/IRanges
22 stars 10 forks source link

Make the IRanges() constructor compatible with ALTREP #18

Open hpages opened 5 years ago

hpages commented 5 years ago

(Moving this from the devteam-bioc Slack to GitHub)

Jiefei wrote on Slack:

Hi Herve, I would like to make some changes to the IRanges package but I might miss something so I want to hear your opinion. For making the IRanges package compatible with ALTREP, the IRanges should be able to use the user's input as its start and width variable, so for the C level constructor:


SEXP solve_user_SEW0(SEXP start, SEXP end, SEXP width)
{
SEXP ans, ans_start, ans_width;
int ans_length, i;
ans_length = LENGTH(start);
PROTECT(ans_start = NEW_INTEGER(ans_length));
PROTECT(ans_width = NEW_INTEGER(ans_length));
for (i = 0; i < ans_length; i++) {
    if (solve_user_SEW0_row(INTEGER(start)[i],
                INTEGER(end)[i],
                INTEGER(width)[i],
                INTEGER(ans_start) + i,
                INTEGER(ans_width) + i) != 0)
    {
        UNPROTECT(2);
        error("solving row %d: %s", i + 1, errmsg_buf);
    }
}
PROTECT(ans = _new_IRanges("IRanges", ans_start, ans_width, R_NilValue));
UNPROTECT(3);
return ans;

}

> I plan to change it to

SEXP solve_user_SEW0(SEXP start, SEXP end, SEXP width) { SEXP ans; int duplicate_num = 0; int ans_length = LENGTH(start); for (int i = 0; i < ans_length; i++) { if (solve_user_SEW0_row(i, &start,&end, &width, &duplicate_num ) != 0) { error("solving row %d: %s", i + 1, errmsg_buf); } } PROTECT(ans = _new_IRanges("IRanges", start, width, R_NilValue)); UNPROTECT(1+ duplicate_num); return ans; }


> The function `solve_user_SEW0_row` will duplicate `start` and `width` if it need to change the value of these two variables. The similar pattern is also found in the function `solve_user_SEW`. I would like to know if there is any concern that prevents us from using the user's input as the IRanges object's slots. Do you think these changes could be a solution for the ALTREP problem?
hpages commented 5 years ago

Sure. Not duplicating the user input when it doesn't need to be modified sounds like a good general principle anyway, even without ALTREP in the picture. I will suggest a slightly different implementation though. I'm not a big fan of the duplicate_num thing. I found it cleaner and much easier to maintain in the long run to do the UNPROTECTs in the function body that did the PROTECTs. In other words, I've always tried to have each function clean-up after itself rather than delegating the cleaning to the caller. I have a meeting in 20s (at 3pm EST) but will propose something after that.

hpages commented 5 years ago

@Jiefei-Wang

Hi Jiefei,

This is done in IRanges 2.19.12: https://github.com/Bioconductor/IRanges/commit/6eb40e6a0d2dc9bccf8fdffc2b91f282bb023c94

Note the two-pass approach. The idea is to do a first pass in order to:

With this version of IRanges:

library(IRanges)
library(XVector)  # for XVector:::address()

start <- 101:105
width <- 7:3
ir1 <- IRanges(start, width=width)
stopifnot(XVector:::address(start(ir1)) == XVector:::address(start))
stopifnot(XVector:::address(width(ir1)) == XVector:::address(width))

width <- c(7:5, NA, NA)
ir2 <- IRanges(start, width=width)
# Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE = "IRanges") : 
#   In range 4: at least two out of 'start', 'end', and 'width', must
#   be supplied.

end <- c(NA, NA, NA, 999, 104)
ir3 <- IRanges(start, end, width)
stopifnot(XVector:::address(start(ir3)) == XVector:::address(start))
stopifnot(XVector:::address(width(ir3)) != XVector:::address(width))

start <- c(101:104, NA)
width <- 7:3
end <- c(NA, NA, NA, NA, 10)
ir4 <- IRanges(start, end, width)
stopifnot(XVector:::address(start(ir4)) != XVector:::address(start))
stopifnot(XVector:::address(width(ir4)) == XVector:::address(width))

Let me know how that works in the ALTREP context.

H.

Jiefei-Wang commented 5 years ago

Hi @hpages ,

From our discussion last week, we can make use of ALTREP if the code can operate on SEXPs instead of data pointers. Would you like me to apply these changes and make a pull request? I will preserve the code structure as possible as I can.

Best, Jiefei

hpages commented 5 years ago

Sounds good Jiefei. Thanks!