Open galenseilis opened 2 years ago
@galenseilis Double check what that data type of the gene.data variable is for their multi color example vs what yours is. For trouble shooting I usually start from their example and try and make sure that all data structure types are the same. Sometimes dataframes and matrices , multi-D vectors will have different behaviours.
Also have you tried with a specific path ID and no sapply?
This is just a sloppy copy-paste from one of my Obsidian vaults. Below I installed the R
environment on a spare laptop, examined the types, tried casting to a matrix type, and fiddled with some of the inputs. I'm tempted to automate trying various combinations of options, write them to separate folders, and look to see what generates desired results. If none of the possibilities of arguments to parameters work, then there must be something wrong either with the software or with the data input. Indeed, the vignette cautions:
Special Note: some KEGG xml data files are incomplete, inconsistent with corresponding png image or inaccu- rate/incorrect on some parts. These issues may cause inaccuracy, incosistency, or error messages although pathview tries the best to accommodate them. For instance, we may see inconistence between KEGG view and Graphviz view. As in the latter case, the pathway layout is generated based on data from xml file.
So, the issue should remain open for now.
R_initialize.R
[Ctrl] + [Shift] + [Enter]
after selecting the code.R
$ R
R version 4.1.2 (2021-11-01) -- "Bird Hippie"
Update all/some/none? [a/s/n]:
. I entered a
, which represents "all".R_Initialize.R
, I get a large printout. Here is the first issue in the printout:* installing *source* package ‘nloptr’ ...
** package ‘nloptr’ successfully unpacked and MD5 sums checked
** using staged installation
checking whether the C++ compiler works... yes
checking for C++ compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether the compiler supports GNU C++... yes
checking whether g++ -std=gnu++14 accepts -g... yes
checking for g++ -std=gnu++14 option to enable C++11 features... none needed
checking how to run the C++ preprocessor... g++ -std=gnu++14 -E
checking whether the compiler supports GNU C++... (cached) yes
checking whether g++ -std=gnu++14 accepts -g... (cached) yes
checking for g++ -std=gnu++14 option to enable C++11 features... (cached) none needed
checking for pkg-config... /usr/bin/pkg-config
checking if pkg-config knows NLopt... no
checking for cmake... no
------------------ CMAKE NOT FOUND --------------------
CMake was not found on the PATH. Please install CMake:
- sudo yum install cmake (Fedora/CentOS; inside a terminal)
- sudo apt install cmake (Debian/Ubuntu; inside a terminal).
- sudo pacman -S cmake (Arch Linux; inside a terminal).
- sudo brew install cmake (MacOS; inside a terminal with Homebrew)
- sudo port install cmake (MacOS; inside a terminal with MacPorts)
Alternatively install CMake from: <https://cmake.org/>
-------------------------------------------------------
The first problem is that cmake
is not installed. Next I ran sudo apt-get install cmake
in BASH, which appeared to successfully download and install CMake.
Attempted to rerun R_Initialize.R
.
Next issue appears to be that libcurl
is not installed.
* installing *source* package ‘curl’ ...
** package ‘curl’ successfully unpacked and MD5 sums checked
** using staged installation
Package libcurl was not found in the pkg-config search path.
Perhaps you should add the directory containing `libcurl.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libcurl' found
Package libcurl was not found in the pkg-config search path.
Perhaps you should add the directory containing `libcurl.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libcurl' found
Googling this I found the page:
sudo apt-get install libcurl4-openssl-dev
and
sudo apt-get install pkg-config
. Running these commands seems to install particular packages. Then I again tried to rerun the original R
script.Next is the xml2
package:
* installing *source* package ‘xml2’ ...
** package ‘xml2’ successfully unpacked and MD5 sums checked
** using staged installation
Package libxml-2.0 was not found in the pkg-config search path.
Perhaps you should add the directory containing `libxml-2.0.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libxml-2.0' found
Package libxml-2.0 was not found in the pkg-config search path.
Perhaps you should add the directory containing `libxml-2.0.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libxml-2.0' found
The page https://stackoverflow.com/questions/7765429/unable-to-install-r-package-in-ubuntu-11-04 suggests that I run sudo apt-get install libxml2-dev
, which I then did.
I see a warning in the output: Warning: dependency ‘lasso2’ is not available
. It seems that I can install lasso2
via sudo apt-get install r-cran-lasso2
. Again, I rerun the initialize script.
Next I see that some of [[Dylan Fossl]]'s code blends Python into the mix.
> #Setting up python instance
> #cmd- which python3 gets path
> {
+ use_python(pythonInstanceDir, required = T)
+ py_config()
+ source_python(pythonMo .... [TRUNCATED]
Error in py_run_file_impl(file, local, convert) :
ModuleNotFoundError: No module named 'sklearn'
Attempting to install sklearn
:
$ pip install sklearn
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: sklearn in ./.local/lib/python3.10/site-packages (0.0.post1)
This output confuses me. Why would it not be able to write to site package? I am going to reinstall my computer and try again. Rebooting and attempting with pip
did not work, but of course there is a aptitude
solution: sudo apt-get install python3-sklearn
.
Presumably other Python packages are required for this code.
With no errors running R_Initialize.R
, I ran pathview_diagrams.R
. This leads to a bunch of repeated warnings:
Warning: None of the genes or compounds mapped to the pathway!
Argument gene.idtype or cpd.idtype may be wrong.
Let us check the data types in pathview_diagrams.R
.
Check kegg_LFC
:
> str(kegg_LFC)
'data.frame': 4934 obs. of 4 variables:
$ 1h_1 : num 2.389 0 0 0.673 0 ...
$ 2h_2 : num 1.301 0.335 0 0.246 0 ...
$ 24h_3: num 4.133 0 -0.21 0.227 0 ...
$ 49h_4: num 1.337 0.3 0 0.215 0 ...
- attr(*, "na.action")= 'omit' Named int [1:167] 434 511 548 567 804 836 967 1092 1186 1324 ...
..- attr(*, "names")= chr [1:167] "CMD085C" "CMD190C" "CME048C" "CME069C" ...
> class(kegg_LFC)
[1] "data.frame"
> type(kegg_LFC)
[1] "double"
> typeof(kegg_LFC)
[1] "list"
And likewise, let us check pathways
:
> str(pathways)
chr [1:120] "cme00010" "cme00020" "cme00030" "cme00040" "cme00051" "cme00052" "cme00053" "cme00061" "cme00062" "cme00071" "cme00100" ...
> class(pathways)
[1] "character"
> type(pathways)
[1] "character"
> typeof(pathways)
[1] "character"
Let us check the data types in this example:
library(pathview)
sim.cpd.data=sim.mol.data(mol.type="cpd", nmol=3000) sim.cpd.data2 = matrix(sample(sim.cpd.data, 18000, replace = T), ncol = 6) i <- 3
pv.out <- pathview(gene.data = gse16873.d[, 1:6], pathway.id = "00640", species = "hsa", keys.align = "y", kegg.native = T, match.data = F, multi.state = T, same.layer = T )
Let's look at `gse16873.d[, 1:6]`:
```R
> str(gse16873.d[, 1:6])
num [1:11979, 1:6] -0.3076 0.4159 0.1985 -0.2316 -0.0449 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:11979] "10000" "10001" "10002" "10003" ...
..$ : chr [1:6] "DCIS_1" "DCIS_2" "DCIS_3" "DCIS_4" ...
> class(gse16873.d[, 1:6])
[1] "matrix" "array"
> type(gse16873.d[, 1:6])
Error in type(gse16873.d[, 1:6]) : could not find function "type"
> typeof(gse16873.d[, 1:6])
[1] "double"
It is odd that this matrix object doesn't have "type". Perhaps that is a property of R
's matrix
class, or perhaps type
is part of a particular package that I have not imported in this particular example of using pathview
. But before looking too far into that, perhaps it is noteworthy that gse16873.d[, 1:6]
has the "matrix" "array"
class(es), whereas kegg_LFC
is the "data.frame"
class. One option might be to convert kegg_LFC
to a matrix using something like data.matrix
. But before I look into that further, let us also go back to [[Dylan Fossl]]'s NewFullRNASeqPipeline.R
to see what types we have.
First, kegg_LFC
> str(kegg_LFC)
'data.frame': 1433 obs. of 1 variable:
$ LFC_LPvRM: num -0.19893 0.37761 -0.00212 0.62521 -0.04184 ...
> class(kegg_LFC)
[1] "data.frame"
> type(kegg_LFC)
[1] "double"
> typeof(kegg_LFC)
[1] "list"
And then pathways
> str(pathways)
chr [1:91] "cme04145" "cme03060" "cme00510" "cme00564" "cme00360" "cme00196" "cme00410" "cme00562" "cme00561" "cme01210" "cme00040" ...
> class(pathways)
[1] "character"
> type(pathways)
[1] "character"
> typeof(pathways)
[1] "character"
So it seems like all of these data structures are at least a little bit different. The pathview vignette has the original data from a data.frame
object, but the example I ran from the documentation uses a matrix
. If the issue is that my implementation and Dylan's implementations use data.frame
s rather than matrix
, then I am surprised that Dylan's implementation worked before but not now. Perhaps something changed with either pathview's implementation, or something more general to R
, or something with related packages/dependencies. But the distinct data types motivates me to try using a matrix
type. We can attempt to cast the type using data.matrix
.
data.frame
object to matrix
using data.matrix
, but still the boxes were not colored at all. Here are the types of the resulting kegg_LFC
:
> str(kegg_LFC)
num [1:4934, 1:4] 2.389 0 0 0.673 0 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:4934] "CMA001C" "CMA004C" "CMA005C" "CMA007C" ...
..$ : chr [1:4] "1h_1" "2h_2" "24h_3" "49h_4"
> class(kegg_LFC)
[1] "matrix" "array"
> type(kegg_LFC)
[1] "double"
> typeof(kegg_LFC)
[1] "double"
So I have apparently succeeded in converting the to matrix
, but the coloring is still blank. So I have not identified the problem yet. Here are a couple of directions to take this.
The first could be that there is something wrong with the names in pathways
. Maybe running a single example without using the sapply
function, as suggested [[Dylan Fossl]] suggested that I try.
Using the same instance where I had converted to matrix
class, I tried running the command on pathways[1]
.
> pathview(
+ gene.data = kegg_LFC,
+ pathway.id = pathways[1],
+ species = "cme",
+ gene.idtype = "KEGG",
+ limit = c(min(kegg_LFC), max(kegg_LFC)),
+ keys.align = "y",
+ kegg.native = T,
+ match.data = F,
+ multi.state = T,
+ same.layer = T,
+ plot.col.key=FALSE
+ )
Warning: None of the genes or compounds mapped to the pathway!
Argument gene.idtype or cpd.idtype may be wrong.
Info: Working in directory /home/galen/Dropbox/UNBC/Rader/RaderLabCode/src
Info: Writing image file cme00010.pathview.png
Warning message:
In colnames(plot.data)[c(1, 3, 9:ncs)] <- c("kegg.names", "all.mapped", :
number of items to replace is not a multiple of replacement length
Specifically, that warning is new:
Warning message:
In colnames(plot.data)[c(1, 3, 9:ncs)] <- c("kegg.names", "all.mapped", :
number of items to replace is not a multiple of replacement length
Maybe this is somehow related to that session having used data.matrix
on kegg_LFC
, so let us look at running just that first pathway without being confounded by the type conversion.
The second is there could be something wrong with the gene names in kegg_LFC
. The genes in the pathview example are actually numbers rather than codes that have names in them.
A third, weirder idea: Should we be assigning kegg_LFC <- t_dfs -1
? If this is backwards from what is should be, maybe blank color is what we get with negated values? This doesn't agree with the warning:
Warning: None of the genes or compounds mapped to the pathway!
Argument gene.idtype or cpd.idtype may be wrong.
which suggests to us that there is something wrong with the gene.idtype
.
What is gene.idtype
exactly? I guess this constitutes the type of string format that pathview expects the genes to be in. [[Dylan Fossl]]'s code uses gene.idtype="KEGG"
. The Pathview vignette says:
For details check:
gene.idtype.list
anddata(rn.list)
;names(rn.list)
.
Checking gene.idtype.list
I find:
> gene.idtype.list
[1] "SYMBOL" "GENENAME" "ENSEMBL" "ENSEMBLPROT" "UNIGENE" "UNIPROT"
[7] "ACCNUM" "ENSEMBLTRANS" "REFSEQ" "ENZYME" "TAIR" "PROSITE"
[13] "ORF"
"KEGG"
is not among these options, although I see from this post that [[Dylan Fossl]] is not the only one to use this. The Common Errors part of the vignette says:
- mismatch between the IDs for gene.data (or cpd.data) and gene.idtype (or cpd.idtype). For example,
gene.data
orcpd.data
uses some extern ID types, whilegene.idtype = "entrez"
andcpd.idtype = "kegg"
(default).
This suggests to me that I should look at trying some different inputs for gene.idtype
.
Another potential problem highlighted in this vignette is:
pathway.id wrong or wrong format, right format should be a five digit number, like 04110, 00620 etc.
The current code does not use these 5-number codes per se, but rather ours look like this:
> pathways[1:5]
[1] "cme00010" "cme00020" "cme00030" "cme00040" "cme00051"
which have 5 numbers at the end but all have the three letters "cme"
at the front. Let us try replacing pathway.id=pid
with pathway.id=substr(pid, 4, 4+5)
and see what it does. This did not work, still giving Warning: None of the genes or compounds mapped to the pathway!
. This post makes me wonder if I need to setup running all combinations of inputs up to a restricted generating set.
A tempting approach would be to keep pathway.id=pid
and set gene.idtype="SYMBOL"
, but this leads to an error suggesting that this is not the right approach.
Error in pathview(gene.data = kegg_LFC, pathway.id = pid, species = "cme", :
No proper gene annotation package available!
In addition: Warning messages:
1: In DESeqDataSet(se, design = design, ignoreRank) :
some variables in design formula are characters, converting to factors
2: In DESeqDataSet(se, design = design, ignoreRank) :
some variables in design formula are characters, converting to factors
3: In DESeqDataSet(se, design = design, ignoreRank) :
some variables in design formula are characters, converting to factors
4: In DESeqDataSet(se, design = design, ignoreRank) :
Error in pathview(gene.data = kegg_LFC, pathway.id = pid, species = "cme", :
No proper gene annotation package available!
Note to self: run debugger to see what is happening line-by-line.
Here is a traceback
Error in pathview(gene.data = kegg_LFC, pathway.id = pid, species = "cme", :
No proper gene annotation package available!
6.
stop("No proper gene annotation package available!")
5.
pathview(gene.data = kegg_LFC, pathway.id = pid, species = "cme",
gene.idtype = "SYMBOL", limit = c(min(kegg_LFC), max(kegg_LFC)),
keys.align = "y", kegg.native = T, match.data = F, multi.state = T,
same.layer = T, plot.col.key = FALSE)
4.
FUN(X[[i]], ...)
3.
lapply(X = X, FUN = FUN, ...)
2.
sapply(pathways, function(pid) pathview(gene.data = kegg_LFC,
pathway.id = pid, species = "cme", gene.idtype = "SYMBOL",
limit = c(min(kegg_LFC), max(kegg_LFC)), keys.align = "y",
kegg.native = T, match.data = F, multi.state = T, same.layer = T, ...
1.
sapply(pathways, function(pid) pathview(gene.data = kegg_LFC,
pathway.id = pid, species = "cme", gene.idtype = "SYMBOL",
limit = c(min(kegg_LFC), max(kegg_LFC)), keys.align = "y",
kegg.native = T, match.data = F, multi.state = T, same.layer = T, ...
Says the source is not available.
But from that same namespace view I can see the source:
function (gene.data = NULL, cpd.data = NULL, pathway.id, species = "hsa",
kegg.dir = ".", cpd.idtype = "kegg", gene.idtype = "entrez",
gene.annotpkg = NULL, min.nnodes = 3, kegg.native = TRUE,
map.null = TRUE, expand.node = FALSE, split.group = FALSE,
map.symbol = TRUE, map.cpdname = TRUE, node.sum = "sum",
discrete = list(gene = FALSE, cpd = FALSE), limit = list(gene = 1,
cpd = 1), bins = list(gene = 10, cpd = 10), both.dirs = list(gene = T,
cpd = T), trans.fun = list(gene = NULL, cpd = NULL),
low = list(gene = "green", cpd = "blue"), mid = list(gene = "gray",
cpd = "gray"), high = list(gene = "red", cpd = "yellow"),
na.col = "transparent", ...)
{
dtypes = !is.null(gene.data) + !is.null(cpd.data)
cond0 = dtypes == 1 & is.numeric(limit) & length(limit) >
1
if (cond0) {
if (limit[1] != limit[2] & is.null(names(limit)))
limit = list(gene = limit[1:2], cpd = limit[1:2])
}
if (is.null(trans.fun))
trans.fun = list(gene = NULL, cpd = NULL)
arg.len2 = c("discrete", "limit", "bins", "both.dirs", "trans.fun",
"low", "mid", "high")
for (arg in arg.len2) {
obj1 = eval(as.name(arg))
if (length(obj1) == 1)
obj1 = rep(obj1, 2)
if (length(obj1) > 2)
obj1 = obj1[1:2]
obj1 = as.list(obj1)
ns = names(obj1)
if (length(ns) == 0 | !all(c("gene", "cpd") %in% ns))
names(obj1) = c("gene", "cpd")
assign(arg, obj1)
}
if (is.character(gene.data)) {
gd.names = gene.data
gene.data = rep(1, length(gene.data))
names(gene.data) = gd.names
both.dirs$gene = FALSE
ng = length(gene.data)
nsamp.g = 1
}
else if (!is.null(gene.data)) {
if (length(dim(gene.data)) == 2) {
gd.names = rownames(gene.data)
ng = nrow(gene.data)
nsamp.g = 2
}
else if (is.numeric(gene.data) & is.null(dim(gene.data))) {
gd.names = names(gene.data)
ng = length(gene.data)
nsamp.g = 1
}
else stop("wrong gene.data format!")
}
else if (is.null(cpd.data)) {
stop("gene.data and cpd.data are both NULL!")
}
gene.idtype = toupper(gene.idtype)
data(bods)
if (species != "ko") {
species.data = kegg.species.code(species, na.rm = T,
code.only = FALSE)
}
else {
species.data = c(kegg.code = "ko", entrez.gnodes = "0",
kegg.geneid = "K01488", ncbi.geneid = NA, ncbi.proteinid = NA,
uniprot = NA)
gene.idtype = "KEGG"
msg.fmt = "Only KEGG ortholog gene ID is supported, make sure it looks like \"%s\"!"
msg = sprintf(msg.fmt, species.data["kegg.geneid"])
message("Note: ", msg)
}
if (length(dim(species.data)) == 2) {
message("Note: ", "More than two valide species!")
species.data = species.data[1, ]
}
species = species.data["kegg.code"]
entrez.gnodes = species.data["entrez.gnodes"] == 1
if (is.na(species.data["ncbi.geneid"])) {
if (!is.na(species.data["kegg.geneid"])) {
msg.fmt = "Mapping via KEGG gene ID (not Entrez) is supported for this species,\nit looks like \"%s\"!"
msg = sprintf(msg.fmt, species.data["kegg.geneid"])
message("Note: ", msg)
}
else {
stop("This species is not annotated in KEGG!")
}
}
if (is.null(gene.annotpkg))
gene.annotpkg = bods[match(species, bods[, 3]), 1]
if (length(grep("ENTREZ|KEGG|NCBIPROT|UNIPROT", gene.idtype)) <
1 & !is.null(gene.data)) {
if (is.na(gene.annotpkg))
stop("No proper gene annotation package available!")
if (!gene.idtype %in% gene.idtype.bods[[species]])
stop("Wrong input gene ID type!")
gene.idmap = id2eg(gd.names, category = gene.idtype,
pkg.name = gene.annotpkg, unique.map = F)
gene.data = mol.sum(gene.data, gene.idmap)
gene.idtype = "ENTREZ"
}
if (gene.idtype != "KEGG" & !entrez.gnodes & !is.null(gene.data)) {
id.type = gene.idtype
if (id.type == "ENTREZ")
id.type = "ENTREZID"
kid.map = names(species.data)[-c(1:2)]
kid.types = names(kid.map) = c("KEGG", "ENTREZID", "NCBIPROT",
"UNIPROT")
kid.map2 = gsub("[.]", "-", kid.map)
kid.map2["UNIPROT"] = "up"
if (is.na(kid.map[id.type]))
stop("Wrong input gene ID type for the species!")
message("Info: Getting gene ID data from KEGG...")
gene.idmap = keggConv(kid.map2[id.type], species)
message("Info: Done with data retrieval!")
kegg.ids = gsub(paste(species, ":", sep = ""), "", names(gene.idmap))
in.ids = gsub(paste0(kid.map2[id.type], ":"), "", gene.idmap)
gene.idmap = cbind(in.ids, kegg.ids)
gene.data = mol.sum(gene.data, gene.idmap)
gene.idtype = "KEGG"
}
if (is.character(cpd.data)) {
cpdd.names = cpd.data
cpd.data = rep(1, length(cpd.data))
names(cpd.data) = cpdd.names
both.dirs$cpd = FALSE
ncpd = length(cpd.data)
}
else if (!is.null(cpd.data)) {
if (length(dim(cpd.data)) == 2) {
cpdd.names = rownames(cpd.data)
ncpd = nrow(cpd.data)
}
else if (is.numeric(cpd.data) & is.null(dim(cpd.data))) {
cpdd.names = names(cpd.data)
ncpd = length(cpd.data)
}
else stop("wrong cpd.data format!")
}
if (length(grep("kegg", cpd.idtype)) < 1 & !is.null(cpd.data)) {
data(rn.list)
cpd.types = c(names(rn.list), "name")
cpd.types = tolower(cpd.types)
cpd.types = cpd.types[-grep("kegg", cpd.types)]
if (!tolower(cpd.idtype) %in% cpd.types)
stop("Wrong input cpd ID type!")
cpd.idmap = cpd2kegg(cpdd.names, in.type = cpd.idtype)
cpd.data = mol.sum(cpd.data, cpd.idmap)
}
warn.fmt = "Parsing %s file failed, please check the file!"
if (length(grep(species, pathway.id)) > 0) {
pathway.name = pathway.id
pathway.id = gsub(species, "", pathway.id)
}
else pathway.name = paste(species, pathway.id, sep = "")
kfiles = list.files(path = kegg.dir, pattern = "[.]xml|[.]png")
npath = length(pathway.id)
out.list = list()
tfiles.xml = paste(pathway.name, "xml", sep = ".")
tfiles.png = paste(pathway.name, "png", sep = ".")
if (kegg.native)
ttype = c("xml", "png")
else ttype = "xml"
xml.file <- paste(kegg.dir, "/", tfiles.xml, sep = "")
for (i in 1:npath) {
if (kegg.native)
tfiles = c(tfiles.xml[i], tfiles.png[i])
else tfiles = tfiles.xml[i]
if (!all(tfiles %in% kfiles)) {
dstatus = download.kegg(pathway.id = pathway.id[i],
species = species, kegg.dir = kegg.dir, file.type = ttype)
if (dstatus == "failed") {
warn.fmt = "Failed to download KEGG xml/png files, %s skipped!"
warn.msg = sprintf(warn.fmt, pathway.name[i])
message("Warning: ", warn.msg)
return(invisible(0))
}
}
if (kegg.native) {
node.data = try(node.info(xml.file[i]), silent = T)
if (class(node.data)[1] == "try-error") {
warn.msg = sprintf(warn.fmt, xml.file[i])
message("Warning: ", warn.msg)
return(invisible(0))
}
node.type = c("gene", "enzyme", "compound", "ortholog")
sel.idx = node.data$type %in% node.type
nna.idx = !is.na(node.data$x + node.data$y + node.data$width +
node.data$height)
sel.idx = sel.idx & nna.idx
if (sum(sel.idx) < min.nnodes) {
warn.fmt = "Number of mappable nodes is below %d, %s skipped!"
warn.msg = sprintf(warn.fmt, min.nnodes, pathway.name[i])
message("Warning: ", warn.msg)
return(invisible(0))
}
node.data = lapply(node.data, "[", sel.idx)
}
else {
gR1 = try(parseKGML2Graph2(xml.file[i], genes = F,
expand = expand.node, split.group = split.group),
silent = T)
node.data = try(node.info(gR1), silent = T)
if (class(node.data)[1] == "try-error") {
warn.msg = sprintf(warn.fmt, xml.file[i])
message("Warning: ", warn.msg)
return(invisible(0))
}
}
if (species == "ko")
gene.node.type = "ortholog"
else gene.node.type = "gene"
if ((!is.null(gene.data) | map.null) & sum(node.data$type ==
gene.node.type) > 1) {
plot.data.gene = node.map(gene.data, node.data,
node.types = gene.node.type, node.sum = node.sum,
entrez.gnodes = entrez.gnodes)
kng = plot.data.gene$kegg.names
kng.char = gsub("[0-9]", "", unlist(kng))
if (any(kng.char > ""))
entrez.gnodes = FALSE
if (map.symbol & species != "ko" & entrez.gnodes) {
if (is.na(gene.annotpkg)) {
warn.fmt = "No annotation package for the species %s, gene symbols not mapped!"
warn.msg = sprintf(warn.fmt, species)
message("Warning: ", warn.msg)
}
else {
plot.data.gene$labels = eg2id(as.character(plot.data.gene$kegg.names),
category = "SYMBOL", pkg.name = gene.annotpkg)[,
2]
mapped.gnodes = rownames(plot.data.gene)
node.data$labels[mapped.gnodes] = plot.data.gene$labels
}
}
cols.ts.gene = node.color(plot.data.gene, limit$gene,
bins$gene, both.dirs = both.dirs$gene, trans.fun = trans.fun$gene,
discrete = discrete$gene, low = low$gene, mid = mid$gene,
high = high$gene, na.col = na.col)
}
else plot.data.gene = cols.ts.gene = NULL
if ((!is.null(cpd.data) | map.null) & sum(node.data$type ==
"compound") > 1) {
plot.data.cpd = node.map(cpd.data, node.data, node.types = "compound",
node.sum = node.sum)
if (map.cpdname & !kegg.native) {
plot.data.cpd$labels = cpdkegg2name(plot.data.cpd$labels)[,
2]
mapped.cnodes = rownames(plot.data.cpd)
node.data$labels[mapped.cnodes] = plot.data.cpd$labels
}
cols.ts.cpd = node.color(plot.data.cpd, limit$cpd,
bins$cpd, both.dirs = both.dirs$cpd, trans.fun = trans.fun$cpd,
discrete = discrete$cpd, low = low$cpd, mid = mid$cpd,
high = high$cpd, na.col = na.col)
}
else plot.data.cpd = cols.ts.cpd = NULL
if (kegg.native) {
pv.pars = keggview.native(plot.data.gene = plot.data.gene,
cols.ts.gene = cols.ts.gene, plot.data.cpd = plot.data.cpd,
cols.ts.cpd = cols.ts.cpd, node.data = node.data,
pathway.name = pathway.name[i], kegg.dir = kegg.dir,
limit = limit, bins = bins, both.dirs = both.dirs,
discrete = discrete, low = low, mid = mid, high = high,
na.col = na.col, ...)
}
else {
pv.pars = keggview.graph(plot.data.gene = plot.data.gene,
cols.ts.gene = cols.ts.gene, plot.data.cpd = plot.data.cpd,
cols.ts.cpd = cols.ts.cpd, node.data = node.data,
path.graph = gR1, pathway.name = pathway.name[i],
map.cpdname = map.cpdname, split.group = split.group,
limit = limit, bins = bins, both.dirs = both.dirs,
discrete = discrete, low = low, mid = mid, high = high,
na.col = na.col, ...)
}
plot.data.gene = cbind(plot.data.gene, cols.ts.gene)
if (!is.null(plot.data.gene)) {
cnames = colnames(plot.data.gene)[-(1:8)]
nsamp = length(cnames)/2
if (nsamp > 1) {
cnames[(nsamp + 1):(2 * nsamp)] = paste(cnames[(nsamp +
1):(2 * nsamp)], "col", sep = ".")
}
else cnames[2] = "mol.col"
colnames(plot.data.gene)[-(1:8)] = cnames
}
plot.data.cpd = cbind(plot.data.cpd, cols.ts.cpd)
if (!is.null(plot.data.cpd)) {
cnames = colnames(plot.data.cpd)[-(1:8)]
nsamp = length(cnames)/2
if (nsamp > 1) {
cnames[(nsamp + 1):(2 * nsamp)] = paste(cnames[(nsamp +
1):(2 * nsamp)], "col", sep = ".")
}
else cnames[2] = "mol.col"
colnames(plot.data.cpd)[-(1:8)] = cnames
}
out.list[[i]] = list(plot.data.gene = plot.data.gene,
plot.data.cpd = plot.data.cpd)
}
if (npath == 1)
out.list = out.list[[1]]
else names(out.list) = pathway.name
return(invisible(out.list))
}
I don't understand in what sense it is not available if... it is clearly available.
All of these options terminate the debugging. I thought I would be able to step inside the function pathview
, but apparently not.
Since stop("No proper gene annotation package available!")
was called, that implies (is.na(gene.annotpkg))
was TRUE
. The is.na
would return true if no value for gene.annotpkg
was available.
This snippet
if (is.na(gene.annotpkg))
stop("No proper gene annotation package available!")
is nested within
if (length(grep("ENTREZ|KEGG|NCBIPROT|UNIPROT", gene.idtype)) <
1 & !is.null(gene.data)) {
if (is.na(gene.annotpkg))
stop("No proper gene annotation package available!")
if (!gene.idtype %in% gene.idtype.bods[[species]])
stop("Wrong input gene ID type!")
gene.idmap = id2eg(gd.names, category = gene.idtype,
pkg.name = gene.annotpkg, unique.map = F)
gene.data = mol.sum(gene.data, gene.idmap)
gene.idtype = "ENTREZ"
}
and thus (length(grep("ENTREZ|KEGG|NCBIPROT|UNIPROT", gene.idtype)) < 1 & !is.null(gene.data))
is also true. The second condition !is.null(gene.data)
should be met by the fact that we passed in kegg_LFC
. The first condition length(grep("ENTREZ|KEGG|NCBIPROT|UNIPROT", gene.idtype)) < 1
is more interesting. The grep
command will return 1 if 'KEGG'
is in the gene.idtype
argument.
The current code I ran is this:
source("NewFullRNAseqPipeline_metadata.R")
pathways <- scan("../Annotations/Annotations/kegg_pathways.csv", character())
pathways <- pathways[pathways != 'cme01100' & pathways != 'cme01110'] # Exclude these larger diagrams
timepoints <- unique(coldata$timepoint)
t_dfs <- data.frame()
c <- 0
for (t in timepoints) {
c <- c + 1
print(t)
t_col <- coldata[coldata$timepoint == t,]
t_cts <- cts[grepl(t, colnames(cts))]
dds <-DESeqDataSetFromMatrix( countData = t_cts,
colData = t_col,
design = configDesign)
dds <- DESeq(dds)
res <- results(dds)
t_res <- as.data.frame(res)[c("log2FoldChange", "padj")]
t_res$log2FoldChange <- ifelse(t_res$padj < 0.01, t_res$log2FoldChange, 0)
t_res <- t_res['log2FoldChange']
names(t_res)[names(t_res) == 'log2FoldChange'] <- paste(t, c, sep='_')
if (nrow(t_dfs) == 0){
t_dfs <- t_res
} else {
t_dfs = merge(t_dfs, t_res, by="row.names", all=TRUE)
rownames(t_dfs) <- t_dfs$Row.names
t_dfs <- subset(t_dfs, select=-c(Row.names))
}
}
kegg_LFC <- t_dfs * -1
kegg_LFC <- na.omit(kegg_LFC)
kegg_LFC <- kegg_LFC
pv.out.list <- sapply(pathways,
function(pid) pathview(
gene.data = kegg_LFC,
pathway.id = pid,
species = "cme",
gene.idtype = "SYMBOL",
limit = c(min(kegg_LFC), max(kegg_LFC)),
keys.align = "y",
kegg.native = T,
match.data = F,
multi.state = T,
same.layer = T,
plot.col.key=FALSE
)
)
Notably, I didn't specify gene.idtype
even though when I first post this thread I did have that argument. Ah yes, I must have forgotten to revert to gene.idtype="KEGG"
. Rerunning I do not get any errors, but still the Warning: None of the genes or compounds mapped to the pathway!
warning.
Let's chase down that Warning: None of the genes or compounds mapped to the pathway!
in the source code. I can download the source from here.
Running /pathview$ grep -rin "Warning:" */*
I see:
R/combineKEGGnodes.R:27: message("Warning: ", warn.msg)
R/download.kegg.R:41: message("Warning: ", warn.msg)
R/download.kegg.R:58: message("Warning: ", warn.msg)
R/keggview.graph.R:51: message("Warning: reconcile groups sharing member nodes!")
R/node.map.R:45: message("Warning: ", paste("None of the genes or compounds mapped to the pathway!",
R/pathview.R:189: message("Warning: ", warn.msg)
R/pathview.R:198: message("Warning: ", warn.msg)
R/pathview.R:208: message("Warning: ", warn.msg)
R/pathview.R:217: message("Warning: ", warn.msg)
R/pathview.R:233: message("Warning: ", warn.msg)
The line R/node.map.R:45: message("Warning: ", paste("None of the genes or compounds mapped to the pathway!",
suggests I should start at line 45 of node.map.R
.
The complete expression is spans lines 45 and 46:
message("Warning: ", paste("None of the genes or compounds mapped to the pathway!",
"Argument gene.idtype or cpd.idtype may be wrong.", sep="\n"))
In order for that warning to print, it must be true that (length(mapped.mols)==0)
. The variable mapped.mols
is defined by
mapped.mols <- intersect(unlist(node.data$kegg.names), row.names(mol.data))
on line 43 of the same file. The intersect
command likely comes from base R, which performs a set intersection. This intersection is empty for some reason since its length is zero. The unlist
command takes a list structure and simplifies it into a more vector-like structure. This brings us to understanding node.data$kegg.names
and mol.data
. The variable node.data
is a parameter of the node.map
function:
node.map <-
function(mol.data=NULL, node.data, node.types=c("gene", "ortholog", "compound")[1], node.sum =c("sum","mean", "median", "max", "max.abs", "random")[1], entrez.gnodes=TRUE){
type.sel=node.data$type %in% node.types
if(sum(type.sel)<1){
message("Note: ", "No specified node types in the pathway!")
plot.data=NULL
return(plot.data)
}
node.data=lapply(node.data, "[", type.sel)
n.nodes=length(node.data$kegg.names)
spacials=as.matrix(as.data.frame(node.data[c("type", "x", "y", "width", "height")]))
if(node.types[1]=="gene"){
kng=node.data$kegg.names[node.data$type=="gene"]
kng.char=gsub("[0-9]", "", unlist(kng))
if(any(kng.char>"")) entrez.gnodes=FALSE
}
na.plot.data=function(){
sapply(1:n.nodes, function(i){
kns=node.data$kegg.names[[i]]
if(node.types[1]=="gene" & entrez.gnodes) items=as.numeric(kns)
else items=kns
ord=order(items)
items=items[ord]
kns=kns[ord]
return(c(kns[1],"", spacials[i,], NA))
})
}
if(is.null(mol.data)){
plot.data=na.plot.data()
} else{
#map gene data
if(is.character(mol.data)){
gd.names=mol.data
mol.data=rep(1, length(mol.data))
names(mol.data)=gd.names
}
mol.data=cbind(mol.data)
if(is.null(colnames(mol.data))) colnames(mol.data)=paste("ge", 1:ncol(mol.data),sep="")
mapped.mols <- intersect(unlist(node.data$kegg.names), row.names(mol.data))
if(length(mapped.mols)==0){
message("Warning: ", paste("None of the genes or compounds mapped to the pathway!",
"Argument gene.idtype or cpd.idtype may be wrong.", sep="\n"))
plot.data=na.plot.data()
} else{
if(node.types[1]=="gene" & entrez.gnodes) mapped.mols =as.numeric(mapped.mols)
plot.data=sapply(1:n.nodes, function(i){
kns=node.data$kegg.names[[i]]
if(node.types[1]=="gene" & entrez.gnodes) items=as.numeric(kns)
else items=kns
ord=order(items)
items=items[ord]
kns=kns[ord]
hit=items %in% mapped.mols
if(sum(hit)==0) {
return(c(kns[1], "", spacials[i,], rep(NA, ncol(mol.data))))
} else if(sum(hit)==1) {
edata=mol.data[as.character(items[hit]),]
return(c(kns[hit], kns[hit], spacials[i,], edata))
} else {
node.sum=eval(as.name(node.sum))
# edata=apply(cbind(mol.data[as.character(items[hit]),]), 2, node.sum, na.rm=T)
# edata=apply(cbind(mol.data[as.character(items[hit]),]), 2, function(x){
edata=apply(mol.data[as.character(items[hit]),,drop=F], 2, function(x){
x=x[!is.na(x)]
if(length(x)<1) return(NA)
else return(node.sum(x, na.rm=F))
})
return(c(kns[hit][1], paste(kns[hit],collapse=","), spacials[i,], edata))
}
})
}
}
colnames(plot.data)=names(node.data$kegg.names)
plot.data=as.data.frame(t(plot.data), stringsAsFactors = F)
plot.data$labels=node.data$labels
ncs=ncol(plot.data)
plot.data=plot.data[,c(1,ncs,2:(ncs-1))]
if(is.null(mol.data)) cns="mol.data" else cns=colnames(mol.data)
colnames(plot.data)[c(1,3,9:ncs)]=c("kegg.names","all.mapped",cns)#c(1,8:ncs)
for(ic in (1:ncol(plot.data))[-c(1:4)]) plot.data[,ic]=as.numeric(plot.data[,ic])#-c(1:3)
return(plot.data)
}
So to further understand node.data
we will have to find out where node.map
is called in the code base, and what is the initial input. The same can be said of mol.data
. We can return to the logic within node.map
later.
Using grep
again, we find:
pathview$ grep -rin "node.map" */*
man/cpdidmap.Rd:76: \code{\link{node.map}} the node data mapper function.
man/eg2id.Rd:110: \code{\link{node.map}} the node data mapper function.
man/mol.sum.Rd:77: \code{\link{node.map}} the node data mapper function.
man/node.color.Rd:28: the result returned by \code{node.map} function. It is a data.frame composed
man/node.color.Rd:31: parsed or mapped node data. Check \code{node.map} for details.
man/node.color.Rd:126:node.map function into pseudo colors, which then can be plotted on the
man/node.color.Rd:149: \code{\link{node.map}} the node data mapper function.
man/node.color.Rd:156:plot.data.gene=node.map(mol.data=gse16873.d[,1], node.data,
man/node.map.Rd:1:\name{node.map}
man/node.map.Rd:2:\alias{node.map}
man/node.map.Rd:12:node.map(mol.data = NULL, node.data, node.types = c("gene", "ortholog",
man/node.map.Rd:49:Mapper function node.map maps user supplied molecular data to KEGG
man/node.map.Rd:107:plot.data.gene=node.map(mol.data=gse16873.d[,1], node.data,
man/pathview.Rd:216:data.frame returned by node.map function for rendering mapped gene
man/pathview.Rd:411: \code{\link{node.map}} and \code{\link{node.color}} the mapper.
man/sim.mol.data.Rd:94: \code{\link{node.map}} the node data mapper function.
R/node.map.R:1:node.map <-
R/pathview.R:225: plot.data.gene=node.map(gene.data, node.data, node.types=gene.node.type, node.sum=node.sum, entrez.gnodes=entrez.gnodes)
R/pathview.R:245: plot.data.cpd=node.map(cpd.data, node.data, node.types="compound", node.sum=node.sum)
The *.Rd
files are a kind of documentation file. Let us repeat for R files only.
/pathview$ grep -rin "node.map" */*.R
R/node.map.R:1:node.map <-
R/pathview.R:225: plot.data.gene=node.map(gene.data, node.data, node.types=gene.node.type, node.sum=node.sum, entrez.gnodes=entrez.gnodes)
R/pathview.R:245: plot.data.cpd=node.map(cpd.data, node.data, node.types="compound", node.sum=node.sum)
Because we are interested in gene data, it makes sense to investigate line of 225 rather than line 245 of pathview.R
. The line plot.data.gene=node.map(gene.data, node.data, node.types=gene.node.type, node.sum=node.sum, entrez.gnodes=entrez.gnodes)
is nested within an if statement.
if((!is.null(gene.data) |map.null) & sum(node.data$type==gene.node.type)>1){
plot.data.gene=node.map(gene.data, node.data, node.types=gene.node.type, node.sum=node.sum, entrez.gnodes=entrez.gnodes)
kng=plot.data.gene$kegg.names
kng.char=gsub("[0-9]", "", unlist(kng))
if(any(kng.char>"")) entrez.gnodes=FALSE
if(map.symbol & species!="ko" & entrez.gnodes) {
if(is.na(gene.annotpkg)) {
warn.fmt="No annotation package for the species %s, gene symbols not mapped!"
warn.msg=sprintf(warn.fmt, species)
message("Warning: ", warn.msg)
} else {
# browser()
plot.data.gene$labels=eg2id(as.character(plot.data.gene$kegg.names), category="SYMBOL", pkg.name=gene.annotpkg)[,2]
mapped.gnodes=rownames(plot.data.gene)
node.data$labels[mapped.gnodes]=plot.data.gene$labels
}
}
cols.ts.gene=node.color(plot.data.gene, limit$gene, bins$gene, both.dirs=both.dirs$gene, trans.fun=trans.fun$gene, discrete=discrete$gene, low=low$gene, mid=mid$gene, high=high$gene, na.col=na.col)
}
Thus ((!is.null(gene.data) |map.null) & sum(node.data$type==gene.node.type)>1)
must be true for our inputs. The node.data
variable is assigned from try(node.info(xml.file[i]), silent=T)
. We can in turn look for node.info
in the code base using grep
.
/pathview$ grep -rin "node.info" */*.R
R/node.info.R:1:node.info <-
R/pathview.R:195: node.data=try(node.info(xml.file[i]), silent=T)
R/pathview.R:214: node.data=try(node.info(gR1), silent=T)
So node.info
is this function:
node.info <-
function(object, short.name=TRUE){
cobj=class(object)[1]
if(cobj=="character"){
object <-parseKGML2(object)
ndata=nodes(object)
} else if(cobj=="KEGGPathway"){
ndata=nodes(object)
} else if(cobj=="graphNEL"){
ndata=getKEGGnodeData(object)
} else {
stop("object should be either a filename, KEGGPathway, or graphNEL!")
}
nodeNames=sapply(ndata, getName)
nodeType=sapply(ndata, getType)
nodeComp=sapply(ndata, getComponent)
node.size=sapply(nodeComp, length)
grs1=sapply(ndata, function(x){
grs=x@graphics
c(labels=grs@name, shape=grs@type)
})
grs2=sapply(ndata, function(x){
grs=x@graphics
c(x=grs@x,y=grs@y,width=grs@width,height=grs@height)
})
grs1=t(grs1)
grs2=t(grs2)
graphic.data=as.list(cbind(data.frame(grs1, stringsAsFactors=F), data.frame(grs2)))
nd.list=list(kegg.names=nodeNames, type=nodeType, component=nodeComp, size=node.size)
nd.list=c(nd.list, graphic.data)
if(short.name){
gnames=sapply(strsplit(nd.list$labels, ", "), "[[", 1)
map.idx=nd.list$type=="map"
gnames[map.idx]=nd.list$labels[map.idx]
gnames[is.na(gnames)]=""
gnames=gsub("[.][.][.]", "", gnames)
nd.list$labels=gnames
nd.list$kegg.names=lapply(nd.list$kegg.names, function(x) gsub("^.*:", "", x))
}
nn=names(nodeNames)
nd.list= lapply(nd.list, function(x) {
names(x) <- nn
return(x)
})
return(nd.list)
}
This is starting to multifurcate into many branches. This approach is more likely to succeed with diagrams.
Looking inside pv.out.list
, I can see that there is a mapped
column containing empty strings.
> pv.out.list[1]$cme00010$plot.data.gene$kegg.names
[1] "CYME_CME145C" "CYME_CMO345C" "CYME_CMM296C" "CYME_CMS125C" "CYME_CMS327C" "CYME_CMS327C"
[7] "CYME_CMI273C" "CYME_CMA145C" "CYME_CMA030C" "CYME_CMK131C" "CYME_CMJ250C" "CYME_CMQ172C"
[13] "CYME_CMI162C" "CYME_CMD041C" "CYME_CMO124C" "CYME_CMJ272C" "CYME_CMO124C" "CYME_CMO276C"
[19] "CYME_CMO124C" "CYME_CMD040C" "CYME_CMO276C" "CYME_CMM299C" "CYME_CMJ305C" "CYME_CMN285C"
[25] "CYME_CMT034C" "CYME_CMK098C" "CYME_CMK188C" "CYME_CMH052C" "CYME_CMF012C"
> pv.out.list[1]$cme00010$plot.data.gene$labels
[1] "CYME_CME145C" "CYME_CMO345C" "CYME_CMM296C" "CYME_CMS125C" "CYME_CMS327C" "CYME_CMS327C"
[7] "CYME_CMI273C" "CYME_CMA145C" "CYME_CMA030C" "CYME_CMK131C" "CYME_CMJ250C" "CYME_CMQ172C"
[13] "CYME_CMI162C" "CYME_CMD041C" "CYME_CMO124C" "CYME_CMJ272C" "CYME_CMO124C" "CYME_CMO276C"
[19] "CYME_CMO124C" "CYME_CMD040C" "CYME_CMO276C" "CYME_CMM299C" "CYME_CMJ305C" "CYME_CMN285C"
[25] "CYME_CMT034C" "CYME_CMK098C" "CYME_CMK188C" "CYME_CMH052C" "CYME_CMF012C"
> pv.out.list[1]$cme00010$plot.data.gene$all.mapped
[1] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
I also see that mol.col
only contains #FFFFFF
:
> pv.out.list[1]$cme00010$plot.data.gene$mol.col
[1] "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF"
[10] "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF"
[19] "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF"
[28] "#FFFFFF" "#FFFFFF"
which is the hexcode for completely white.
Also interesting that only the first time point is showing up:
> colnames(pv.out.list$cme00010$plot.data.gene)
[1] "kegg.names" "labels" "all.mapped" "type" "x" "y" "width"
[8] "height" "1h_1" "mol.col"
It is interesting that in result
which I used to store our case of pid=00010
I see the KEGG names are strings whereas the provided example in pv.out
are integers. A future step may be to provide integers for kegg.names
.
> result
$plot.data.gene
kegg.names labels all.mapped type x y width height 1h_1 mol.col
18 CYME_CME145C CYME_CME145C gene 483 407 46 17 NA #FFFFFF
42 CYME_CMO345C CYME_CMO345C gene 284 948 46 17 NA #FFFFFF
49 CYME_CMM296C CYME_CMM296C gene 487 948 46 17 NA #FFFFFF
50 CYME_CMS125C CYME_CMS125C gene 439 941 46 17 NA #FFFFFF
52 CYME_CMS327C CYME_CMS327C gene 431 858 46 17 NA #FFFFFF
53 CYME_CMS327C CYME_CMS327C gene 319 870 46 17 NA #FFFFFF
54 CYME_CMI273C CYME_CMI273C gene 210 870 46 17 NA #FFFFFF
55 CYME_CMA145C CYME_CMA145C gene 579 869 46 17 NA #FFFFFF
57 CYME_CMA030C CYME_CMA030C gene 483 771 46 17 NA #FFFFFF
59 CYME_CMK131C CYME_CMK131C gene 483 701 46 17 NA #FFFFFF
61 CYME_CMJ250C CYME_CMJ250C gene 458 484 46 17 NA #FFFFFF
62 CYME_CMQ172C CYME_CMQ172C gene 386 449 46 17 NA #FFFFFF
63 CYME_CMI162C CYME_CMI162C gene 458 336 46 17 NA #FFFFFF
64 CYME_CMD041C CYME_CMD041C gene 408 336 46 17 NA #FFFFFF
66 CYME_CMO124C CYME_CMO124C gene 483 265 46 17 NA #FFFFFF
67 CYME_CMJ272C CYME_CMJ272C gene 483 178 46 17 NA #FFFFFF
68 CYME_CMO124C CYME_CMO124C gene 385 302 46 17 NA #FFFFFF
70 CYME_CMO276C CYME_CMO276C gene 233 312 46 17 NA #FFFFFF
71 CYME_CMO124C CYME_CMO124C gene 357 264 46 17 NA #FFFFFF
72 CYME_CMD040C CYME_CMD040C gene 307 264 46 17 NA #FFFFFF
74 CYME_CMO276C CYME_CMO276C gene 233 239 46 17 NA #FFFFFF
79 CYME_CMM299C CYME_CMM299C gene 262 913 46 17 NA #FFFFFF
80 CYME_CMJ305C CYME_CMJ305C gene 483 557 46 17 NA #FFFFFF
121 CYME_CMN285C CYME_CMN285C gene 239 747 46 17 NA #FFFFFF
130 CYME_CMT034C CYME_CMT034C gene 392 521 46 17 NA #FFFFFF
140 CYME_CMK098C CYME_CMK098C gene 96 916 46 17 NA #FFFFFF
147 CYME_CMK188C CYME_CMK188C gene 508 630 46 17 NA #FFFFFF
156 CYME_CMH052C CYME_CMH052C gene 558 336 46 17 NA #FFFFFF
159 CYME_CMF012C CYME_CMF012C gene 574 796 46 17 NA #FFFFFF
$plot.data.cpd
kegg.names labels all.mapped type x y width height mol.data mol.col
45 C00033 C00033 compound 146 958 8 8 NA #FFFFFF
87 C00031 C00031 compound 601 195 8 8 NA #FFFFFF
88 C00103 C00103 compound 483 143 8 8 NA #FFFFFF
89 C00631 C00631 compound 483 664 8 8 NA #FFFFFF
90 C00267 C00267 compound 181 228 8 8 NA #FFFFFF
91 C00221 C00221 compound 181 301 8 8 NA #FFFFFF
92 C00111 C00111 compound 332 448 8 8 NA #FFFFFF
93 C01172 C01172 compound 332 301 8 8 NA #FFFFFF
94 C00668 C00668 compound 483 230 8 8 NA #FFFFFF
95 C05345 C05345 compound 483 303 8 8 NA #FFFFFF
96 C00074 C00074 compound 483 736 8 8 NA #FFFFFF
97 C00197 C00197 compound 483 592 8 8 NA #FFFFFF
98 C00236 C00236 compound 483 520 8 8 NA #FFFFFF
99 C00186 C00186 compound 627 868 8 8 NA #FFFFFF
100 C15972 C15972 compound 314 911 8 8 NA #FFFFFF
101 C00469 C00469 compound 550 958 8 8 NA #FFFFFF
102 C00022 C00022 compound 483 868 8 8 NA #FFFFFF
103 C05125 C05125 compound 378 868 8 8 NA #FFFFFF
104 C00024 C00024 compound 146 868 8 8 NA #FFFFFF
105 C00084 C00084 compound 378 958 8 8 NA #FFFFFF
106 C16255 C16255 compound 265 868 8 8 NA #FFFFFF
107 C15973 C15973 compound 212 911 8 8 NA #FFFFFF
108 C05378 C05378 compound 483 373 8 8 NA #FFFFFF
109 C06186 C06186 compound 132 359 8 8 NA #FFFFFF
110 C06187 C06187 compound 239 359 8 8 NA #FFFFFF
111 C06188 C06188 compound 239 385 8 8 NA #FFFFFF
112 C01451 C01451 compound 132 385 8 8 NA #FFFFFF
118 C00036 C00036 compound 146 736 8 8 NA #FFFFFF
134 C01159 C01159 compound 568 556 8 8 NA #FFFFFF
139 C00118 C00118 compound 483 448 8 8 NA #FFFFFF
146 C00068 C00068 compound 378 825 8 8 NA #FFFFFF
> pv.out
$plot.data.gene
kegg.names labels all.mapped type x y width height DCIS_1 DCIS_2
16 4329 ALDH6A1 4329 gene 159 325 46 17 0.74686683 0.05287812
18 31 ACACA 31,32 gene 159 252 46 17 -0.48282912 -0.75715042
19 23417 MLYCD 23417 gene 159 204 46 17 -0.25145040 -0.37150885
26 18 ABAT 18 gene 276 370 46 17 2.78549061 -0.32723593
35 26275 HIBCH 26275 gene 377 327 46 17 0.76979701 -0.11439553
60 4329 ALDH6A1 4329 gene 951 409 46 17 0.74686683 0.05287812
63 84693 MCEE gene 807 494 46 17 NA NA
64 5095 PCCA 5095,5096 gene 721 390 46 17 1.19029289 -0.30087442
66 55862 ECHDC1 55862 gene 831 390 46 17 0.16245690 0.09019571
68 79611 ACSS3 79611 gene 1051 229 46 17 0.07325135 -0.21532204
69 79611 ACSS3 79611 gene 1051 143 46 17 0.07325135 -0.21532204
71 55902 ACSS2 gene 1001 229 46 17 NA NA
72 55902 ACSS2 gene 1001 143 46 17 NA NA
77 3945 LDHB 3945,3948,92483 gene 743 64 46 17 -0.27453120 0.18750500
83 1892 ECHS1 1892,1962,3030 gene 468 317 46 17 -0.78788127 0.10146314
105 35 ACADS 35 gene 593 317 46 17 -0.37327672 -0.22446399
110 4594 MMUT 4594 gene 807 577 46 17 0.41314463 0.06333223
119 8801 SUCLG2 8801,8802,8803 gene 865 630 46 17 1.67147604 0.26659990
120 8801 SUCLG2 8801,8802,8803 gene 865 609 46 17 1.67147604 0.26659990
197 1738 DLD 1738 gene 769 241 46 17 1.25941863 0.04192270
198 593 BCKDHA 593,594 gene 806 209 46 17 -0.50170330 0.63064744
199 593 BCKDHA 593,594 gene 806 138 46 17 -0.50170330 0.63064744
200 1629 DBT 1629 gene 806 275 46 17 -0.13898650 0.01034633
214 51 ACOX1 51,8310 gene 593 338 46 17 0.56050833 0.13422425
DCIS_3 DCIS_4 DCIS_5 DCIS_6 DCIS_1.col DCIS_2.col DCIS_3.col DCIS_4.col
16 -0.81369145 -0.27709498 -0.18071111 0.122670619 #EF3030 #BEBEBE #00FF00 #8FCE8F
18 -0.63119257 -1.69649387 -0.81430094 -0.131227383 #5FDF5F #30EF30 #30EF30 #00FF00
19 -0.21974935 -0.08768293 -0.36251055 -0.044751017 #8FCE8F #8FCE8F #8FCE8F #BEBEBE
26 -0.63635871 -0.01074421 1.18053902 -0.340124650 #FF0000 #8FCE8F #30EF30 #BEBEBE
35 -0.52919767 -0.25858711 0.13965970 -0.334158512 #EF3030 #BEBEBE #5FDF5F #8FCE8F
60 -0.81369145 -0.27709498 -0.18071111 0.122670619 #EF3030 #BEBEBE #00FF00 #8FCE8F
63 NA NA NA NA #FFFFFF #FFFFFF #FFFFFF #FFFFFF
64 -0.47811945 0.21841228 0.59842717 0.127384456 #FF0000 #8FCE8F #5FDF5F #CE8F8F
66 0.19608061 0.02913517 0.23258972 0.191704083 #BEBEBE #BEBEBE #BEBEBE #BEBEBE
68 0.19192858 0.07799171 0.47373393 0.205822641 #BEBEBE #8FCE8F #BEBEBE #BEBEBE
69 0.19192858 0.07799171 0.47373393 0.205822641 #BEBEBE #8FCE8F #BEBEBE #BEBEBE
71 NA NA NA NA #FFFFFF #FFFFFF #FFFFFF #FFFFFF
72 NA NA NA NA #FFFFFF #FFFFFF #FFFFFF #FFFFFF
77 -0.14385615 -0.53114842 0.20440894 -0.557069022 #8FCE8F #BEBEBE #BEBEBE #5FDF5F
83 -0.34086971 -0.26743804 -0.34815464 -0.006183515 #30EF30 #BEBEBE #8FCE8F #8FCE8F
105 -0.26460784 -0.82938399 -0.21631308 0.509809829 #8FCE8F #8FCE8F #8FCE8F #00FF00
110 -0.36899420 0.02000592 -0.08163558 0.117707016 #DF5F5F #BEBEBE #8FCE8F #BEBEBE
119 -0.91651583 0.14434606 0.34928782 0.023307894 #FF0000 #CE8F8F #00FF00 #BEBEBE
120 -0.91651583 0.14434606 0.34928782 0.023307894 #FF0000 #CE8F8F #00FF00 #BEBEBE
197 -0.29069324 0.04910358 0.30660849 0.205264819 #FF0000 #BEBEBE #8FCE8F #BEBEBE
198 -0.43042885 -0.17590157 0.10787799 0.182615247 #5FDF5F #EF3030 #5FDF5F #BEBEBE
199 -0.43042885 -0.17590157 0.10787799 0.182615247 #5FDF5F #EF3030 #5FDF5F #BEBEBE
200 -0.01534762 -0.04897299 -0.02063803 0.012819618 #BEBEBE #BEBEBE #BEBEBE #BEBEBE
214 -0.52993241 -0.29012849 -0.11773491 0.070639856 #DF5F5F #BEBEBE #5FDF5F #8FCE8F
DCIS_5.col DCIS_6.col
16 #BEBEBE #BEBEBE
18 #00FF00 #BEBEBE
19 #8FCE8F #BEBEBE
26 #FF0000 #8FCE8F
35 #BEBEBE #8FCE8F
60 #BEBEBE #BEBEBE
63 #FFFFFF #FFFFFF
64 #DF5F5F #BEBEBE
66 #CE8F8F #BEBEBE
68 #DF5F5F #CE8F8F
69 #DF5F5F #CE8F8F
71 #FFFFFF #FFFFFF
72 #FFFFFF #FFFFFF
77 #CE8F8F #5FDF5F
83 #8FCE8F #BEBEBE
105 #8FCE8F #DF5F5F
110 #BEBEBE #BEBEBE
119 #CE8F8F #BEBEBE
120 #CE8F8F #BEBEBE
197 #CE8F8F #CE8F8F
198 #BEBEBE #BEBEBE
199 #BEBEBE #BEBEBE
200 #BEBEBE #BEBEBE
214 #BEBEBE #BEBEBE
$plot.data.cpd
kegg.names labels all.mapped type x y width height mol.data mol.col
30 C00222 C00222 compound 225 327 8 8 NA #FFFFFF
31 C00804 C00804 compound 225 451 8 8 NA #FFFFFF
32 C01013 C01013 compound 324 327 8 8 NA #FFFFFF
33 C00099 C00099 compound 325 388 8 8 NA #FFFFFF
36 C00083 C00083 compound 222 228 8 8 NA #FFFFFF
84 C00109 C00109 compound 806 105 8 8 NA #FFFFFF
85 C02876 C02876 compound 911 105 8 8 NA #FFFFFF
86 C00163 C00163 compound 1026 105 8 8 NA #FFFFFF
87 C00186 C00186 compound 523 157 8 8 NA #FFFFFF
88 C00827 C00827 compound 523 228 8 8 NA #FFFFFF
89 C00100 C00100 compound 806 324 8 8 NA #FFFFFF
90 C00683 C00683 compound 807 454 8 8 NA #FFFFFF
91 C02170 C02170 compound 897 454 8 8 NA #FFFFFF
92 C02335 C02335 compound 419 388 8 8 NA #FFFFFF
93 C05668 C05668 compound 418 327 8 8 NA #FFFFFF
94 C05984 C05984 compound 684 63 8 8 NA #FFFFFF
95 C05989 C05989 compound 418 228 8 8 NA #FFFFFF
96 C05983 C05983 compound 1026 179 8 8 NA #FFFFFF
97 C01213 C01213 compound 807 538 8 8 NA #FFFFFF
109 C04225 C04225 compound 1041 532 8 8 NA #FFFFFF
112 C00091 C00091 compound 807 619 8 8 NA #FFFFFF
113 C02225 C02225 compound 1041 433 8 8 NA #FFFFFF
122 C00042 C00042 compound 918 619 8 8 NA #FFFFFF
123 C04593 C04593 compound 1041 619 8 8 NA #FFFFFF
131 C00024 C00024 compound 90 227 8 8 NA #FFFFFF
157 C05979 C05979 compound 637 581 8 8 NA #FFFFFF
160 C00479 C00479 compound 637 468 8 8 NA #FFFFFF
169 C00111 C00111 compound 291 581 8 8 NA #FFFFFF
170 C00583 C00583 compound 499 468 8 8 NA #FFFFFF
171 C00424 C00424 compound 499 581 8 8 NA #FFFFFF
175 C05235 C05235 compound 396 468 8 8 NA #FFFFFF
176 C00546 C00546 compound 396 581 8 8 NA #FFFFFF
180 C06002 C06002 compound 950 454 8 8 NA #FFFFFF
193 C15973 C15973 compound 768 271 8 8 NA #FFFFFF
194 C15972 C15972 compound 768 206 8 8 NA #FFFFFF
195 C21017 C21017 compound 806 177 8 8 NA #FFFFFF
196 C00068 C00068 compound 844 160 8 8 NA #FFFFFF
201 C21018 C21018 compound 806 249 8 8 NA #FFFFFF
209 C21250 C21250 compound 1127 532 8 8 NA #FFFFFF
215 C00894 C00894 compound 523 327 8 8 NA #FFFFFF
Ah, I used the debugger incorrectly earlier. I will try using that again when I have had more sleep.
This minimum example from the documentation works for multiple columns:
This section of code from
NewFullRNASeqPipeline.R
works for a single condition:The following approach uses the LPRM data, but the same args as the minimum working example. But it doesn't work. The pathway diagrams are generated, but assigned values are all white.