cole-trapnell-lab / Scribe

Regulatory networks with Direct Information
16 stars 6 forks source link

super_graph parsing error #14

Closed vperiyasamy closed 5 years ago

vperiyasamy commented 5 years ago

I've been trying to follow the vignette using my own dataset. When creating the super graph, I only want interactions between a known list of regulators and the geneset, so I restrict it as follows:

all_genes <- list(genes, regs)
common = Reduce(intersect, all_genes)
common_indices <- lapply(all_genes, function(x) which(x %in% common))
reg_idx <- common_indices[[1]]

then to create the super_graph, I do as follows (mimicking the vignette):

tmp <- expand.grid(reg_idx, 1:ncol(data), stringsAsFactors = F)
super_graph <- tmp[tmp[, 1] != tmp[, 2], ] - 1
super_graph <- super_graph[, c(2, 1)]

however when calling calculate_rdi_multiple_run(), I get the following error: "super_graph should only include integer less than the number of cells (or only include gene names from genes_data)"

I think this may be a bug because this should only concern with the number of genes, correct? I peeked at the R code and I'm wondering if calculation of n_genes is correct. I verified myself that the min was 0 and max was (# genes - 1). The CDS file I'm using as input is genes x cells, which is the same format as the vignette.

I additionally saw that it accepts a super_graph with gene names instead of indices, so I tried the following:

tmp <- expand.grid(common, genes, stringsAsFactors = F)
super_graph <- tmp[tmp[, 1] != tmp[, 2], ]
super_graph <- super_graph[, c(2, 1)]

but this gave the different error of "Error in calculate_rdi_multiple_run_cpp_wrap(as.matrix(genes_data), delays, : Not compatible with requested type: [type=character; target=integer]."

concatenize commented 5 years ago

I had this exact same issue. Seems like an easy fix. I'll send a PR.

concatenize commented 5 years ago

@Xiaojieqiu , could you confirm that we are supposed to zero-index the genes here? So the check should be if(max(super_graph) > n_genes - 1 | min(super_graph) < 0) ?

Xiaojieqiu commented 5 years ago

Hi @concatenize thanks for your pull request. yes, the gene is zero-indexed for this particular case because the index get passed into cpp code for the heavy-lift computation. (cpp is zero indexed)