Pathview is not assigning colours for the equation-by-equation timepoints

galenseilis commented 2 years ago

This minimum example from the documentation works for multiple columns:

library(pathview)

sim.cpd.data=sim.mol.data(mol.type="cpd", nmol=3000)
sim.cpd.data2 = matrix(sample(sim.cpd.data, 18000,
                              replace = T), ncol = 6)
i <- 3

pv.out <- pathview(gene.data = gse16873.d[, 1:6],
                   pathway.id = "00640",
                   species = "hsa",
                   keys.align = "y",
                   kegg.native = T,
                   match.data = F,
                   multi.state = T,
                   same.layer = T
                   )

hsa00640 gse16873 cpd 3-2s multi

This section of code from NewFullRNASeqPipeline.R works for a single condition:

#__________Python Enrichment_________#
percentiles <- c(.99, .95, .90)
test <- keggEnrichmentOnPercentiles(formatedCSVThereshold0, percentiles, metric="score", outputDir=OutputFileDirectory)

pathways <- py_GetAllPathwaysOfSig(test, p_cutoff=1.0)
pathways <- pathways[pathways != 'cme01100' & pathways != 'cme01110'] # Exclude these larger diagrams

keggDataFrame <- py_getKEGGDataframe(formatedCSVThereshold0)

colnames(keggDataFrame)

# kegg_LFC <- keggDataFrame[,c("LFC_LP1hVRM1h", "LFC_LP2hVRM2h", "LFC_LP24hVRM24h", "LFC_LP49hVRM49h")]
scoreFilteredDataFrame = keggDataFrame['Score_LPvRM' >= 33] # 33 comes from LP manuscript; not for general purpose
kegg_LFC <- scoreFilteredDataFrame[,c(paste("LFC_", treated, "v", control, sep="")), drop=FALSE]

#Flip sign so that legend shows green as positive and red negative.
kegg_LFC <- -1*kegg_LFC

#pathview(gene.data = kegg_LFC, pathway.id = "cme03030", species = "cme", gene.idtype = "KEGG",plot.col.key=FALSE)

#kegg_LFC["CYME_CMK133C",]

pv.out.list <- sapply(
  pathways,
  function(pid) pathview(
    gene.data = kegg_LFC,
    pathway.id = pid,
    species = "cme",
    gene.idtype = "KEGG",
    plot.col.key=FALSE
    )
  )

cme00030 pathview

The following approach uses the LPRM data, but the same args as the minimum working example. But it doesn't work. The pathway diagrams are generated, but assigned values are all white.

source("NewFullRNAseqPipeline_metadata.R")

pathways <- scan("../Annotations/Annotations/kegg_pathways.csv", character())
pathways <- pathways[pathways != 'cme01100' & pathways != 'cme01110'] # Exclude these larger diagrams

timepoints <- unique(coldata$timepoint)

t_dfs <- data.frame()
c <- 0
for (t in timepoints) {
  c <- c + 1
  print(t)
  t_col <- coldata[coldata$timepoint == t,]
  t_cts <- cts[grepl(t, colnames(cts))]
  dds <-DESeqDataSetFromMatrix( countData = t_cts,
                                colData = t_col,
                                design = configDesign)
  dds <- DESeq(dds)
  res <- results(dds)
  t_res <- as.data.frame(res)[c("log2FoldChange", "padj")]

  t_res$log2FoldChange <- ifelse(t_res$padj < 0.01, t_res$log2FoldChange, 0)
  t_res <- t_res['log2FoldChange']
  names(t_res)[names(t_res) == 'log2FoldChange'] <- paste(t, c, sep='_')
  if (nrow(t_dfs) == 0){
    t_dfs <- t_res
  } else {
    t_dfs = merge(t_dfs, t_res, by="row.names", all=TRUE)
    rownames(t_dfs) <- t_dfs$Row.names
    t_dfs <- subset(t_dfs, select=-c(Row.names))
  }
  }

kegg_LFC <- t_dfs * -1
kegg_LFC <- na.omit(kegg_LFC)
pv.out.list <- sapply(pathways,
                      function(pid) pathview(
                        gene.data = kegg_LFC,
                        pathway.id = pid,
                        species = "cme",
                        gene.idtype = "KEGG",
                        limit = c(min(kegg_LFC), max(kegg_LFC)),
                        keys.align = "y",
                        kegg.native = T,
                        match.data = F,
                        multi.state = T,
                        same.layer = T,
                        plot.col.key=FALSE
                        )
                      )

dfossl commented 2 years ago

@galenseilis Double check what that data type of the gene.data variable is for their multi color example vs what yours is. For trouble shooting I usually start from their example and try and make sure that all data structure types are the same. Sometimes dataframes and matrices , multi-D vectors will have different behaviours.

Also have you tried with a specific path ID and no sapply?

galenseilis commented 1 year ago

This is just a sloppy copy-paste from one of my Obsidian vaults. Below I installed the R environment on a spare laptop, examined the types, tried casting to a matrix type, and fiddled with some of the inputs. I'm tempted to automate trying various combinations of options, write them to separate folders, and look to see what generates desired results. If none of the possibilities of arguments to parameters work, then there must be something wrong either with the software or with the data input. Indeed, the vignette cautions:

Special Note: some KEGG xml data files are incomplete, inconsistent with corresponding png image or inaccu- rate/incorrect on some parts. These issues may cause inaccuracy, incosistency, or error messages although pathview tries the best to accommodate them. For instance, we may see inconistence between KEGG view and Graphviz view. As in the latter case, the pathway layout is generated based on data from xml file.

So, the issue should remain open for now.

Attempt to run R_initialize.R
1. Used [Ctrl] + [Shift] + [Enter] after selecting the code.

Checked version of R

$ R
R version 4.1.2 (2021-11-01) -- "Bird Hippie"

While installing packages, I got a prompt: Update all/some/none? [a/s/n]:. I entered a, which represents "all".
Attempting to run R_Initialize.R, I get a large printout. Here is the first issue in the printout:

* installing *source* package ‘nloptr’ ...
** package ‘nloptr’ successfully unpacked and MD5 sums checked
** using staged installation
checking whether the C++ compiler works... yes
checking for C++ compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether the compiler supports GNU C++... yes
checking whether g++ -std=gnu++14 accepts -g... yes
checking for g++ -std=gnu++14 option to enable C++11 features... none needed
checking how to run the C++ preprocessor... g++ -std=gnu++14 -E
checking whether the compiler supports GNU C++... (cached) yes
checking whether g++ -std=gnu++14 accepts -g... (cached) yes
checking for g++ -std=gnu++14 option to enable C++11 features... (cached) none needed
checking for pkg-config... /usr/bin/pkg-config
checking if pkg-config knows NLopt... no
checking for cmake... no

------------------ CMAKE NOT FOUND --------------------

CMake was not found on the PATH. Please install CMake:

 - sudo yum install cmake          (Fedora/CentOS; inside a terminal)
 - sudo apt install cmake          (Debian/Ubuntu; inside a terminal).
 - sudo pacman -S cmake            (Arch Linux; inside a terminal).
 - sudo brew install cmake         (MacOS; inside a terminal with Homebrew)
 - sudo port install cmake         (MacOS; inside a terminal with MacPorts)

Alternatively install CMake from: <https://cmake.org/>

-------------------------------------------------------

The first problem is that cmake is not installed. Next I ran sudo apt-get install cmake in BASH, which appeared to successfully download and install CMake.
Attempted to rerun R_Initialize.R.

Next issue appears to be that libcurl is not installed.

* installing *source* package ‘curl’ ...
** package ‘curl’ successfully unpacked and MD5 sums checked
** using staged installation
Package libcurl was not found in the pkg-config search path.
Perhaps you should add the directory containing `libcurl.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libcurl' found
Package libcurl was not found in the pkg-config search path.
Perhaps you should add the directory containing `libcurl.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libcurl' found

Googling this I found the page:

https://askubuntu.com/questions/1094569/libcurl-not-found-error-message-while-trying-to-compile-a-package which advises to run: sudo apt-get install libcurl4-openssl-dev and sudo apt-get install pkg-config. Running these commands seems to install particular packages. Then I again tried to rerun the original R script.

Next is the xml2 package:

* installing *source* package ‘xml2’ ...
** package ‘xml2’ successfully unpacked and MD5 sums checked
** using staged installation
Package libxml-2.0 was not found in the pkg-config search path.
Perhaps you should add the directory containing `libxml-2.0.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libxml-2.0' found
Package libxml-2.0 was not found in the pkg-config search path.
Perhaps you should add the directory containing `libxml-2.0.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libxml-2.0' found

The page https://stackoverflow.com/questions/7765429/unable-to-install-r-package-in-ubuntu-11-04 suggests that I run sudo apt-get install libxml2-dev, which I then did.

I see a warning in the output: Warning: dependency ‘lasso2’ is not available. It seems that I can install lasso2 via sudo apt-get install r-cran-lasso2. Again, I rerun the initialize script.

Next I see that some of [[Dylan Fossl]]'s code blends Python into the mix.

> #Setting up python instance
> #cmd- which python3 gets path
> {
+ use_python(pythonInstanceDir, required = T)
+ py_config()
+ source_python(pythonMo .... [TRUNCATED] 
Error in py_run_file_impl(file, local, convert) : 
ModuleNotFoundError: No module named 'sklearn'

Attempting to install sklearn:

$ pip install sklearn  
Defaulting to user installation because normal site-packages is not writeable  
Requirement already satisfied: sklearn in ./.local/lib/python3.10/site-packages (0.0.post1)

This output confuses me. Why would it not be able to write to site package? I am going to reinstall my computer and try again. Rebooting and attempting with pip did not work, but of course there is a aptitude solution: sudo apt-get install python3-sklearn. Presumably other Python packages are required for this code.

With no errors running R_Initialize.R, I ran pathview_diagrams.R. This leads to a bunch of repeated warnings:

Warning: None of the genes or compounds mapped to the pathway!
Argument gene.idtype or cpd.idtype may be wrong.

Let us check the data types in pathview_diagrams.R. Check kegg_LFC:

> str(kegg_LFC)
'data.frame':   4934 obs. of  4 variables:
$ 1h_1 : num  2.389 0 0 0.673 0 ...
$ 2h_2 : num  1.301 0.335 0 0.246 0 ...
$ 24h_3: num  4.133 0 -0.21 0.227 0 ...
$ 49h_4: num  1.337 0.3 0 0.215 0 ...
- attr(*, "na.action")= 'omit' Named int [1:167] 434 511 548 567 804 836 967 1092 1186 1324 ...
..- attr(*, "names")= chr [1:167] "CMD085C" "CMD190C" "CME048C" "CME069C" ...
> class(kegg_LFC)
[1] "data.frame"
> type(kegg_LFC)
[1] "double"
> typeof(kegg_LFC)
[1] "list"

And likewise, let us check pathways:

> str(pathways)
chr [1:120] "cme00010" "cme00020" "cme00030" "cme00040" "cme00051" "cme00052" "cme00053" "cme00061" "cme00062" "cme00071" "cme00100" ...
> class(pathways)
[1] "character"
> type(pathways)
[1] "character"
> typeof(pathways)
[1] "character"

Let us check the data types in this example:
```
library(pathview)
```

sim.cpd.data=sim.mol.data(mol.type="cpd", nmol=3000) sim.cpd.data2 = matrix(sample(sim.cpd.data, 18000, replace = T), ncol = 6) i <- 3

pv.out <- pathview(gene.data = gse16873.d[, 1:6], pathway.id = "00640", species = "hsa", keys.align = "y", kegg.native = T, match.data = F, multi.state = T, same.layer = T )

Let's look at `gse16873.d[, 1:6]`:
```R
> str(gse16873.d[, 1:6])
 num [1:11979, 1:6] -0.3076 0.4159 0.1985 -0.2316 -0.0449 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:11979] "10000" "10001" "10002" "10003" ...
  ..$ : chr [1:6] "DCIS_1" "DCIS_2" "DCIS_3" "DCIS_4" ...
> class(gse16873.d[, 1:6])
[1] "matrix" "array"
> type(gse16873.d[, 1:6])
Error in type(gse16873.d[, 1:6]) : could not find function "type"
> typeof(gse16873.d[, 1:6])
[1] "double"

It is odd that this matrix object doesn't have "type". Perhaps that is a property of R's matrix class, or perhaps type is part of a particular package that I have not imported in this particular example of using pathview. But before looking too far into that, perhaps it is noteworthy that gse16873.d[, 1:6] has the "matrix" "array" class(es), whereas kegg_LFC is the "data.frame" class. One option might be to convert kegg_LFC to a matrix using something like data.matrix. But before I look into that further, let us also go back to [[Dylan Fossl]]'s NewFullRNASeqPipeline.R to see what types we have.

First, kegg_LFC

> str(kegg_LFC)
'data.frame':   1433 obs. of  1 variable:
 $ LFC_LPvRM: num  -0.19893 0.37761 -0.00212 0.62521 -0.04184 ...
> class(kegg_LFC)
[1] "data.frame"
> type(kegg_LFC)
[1] "double"
> typeof(kegg_LFC)
[1] "list"

And then pathways

> str(pathways)
 chr [1:91] "cme04145" "cme03060" "cme00510" "cme00564" "cme00360" "cme00196" "cme00410" "cme00562" "cme00561" "cme01210" "cme00040" ...
 > class(pathways)
[1] "character"
> type(pathways)
[1] "character"
> typeof(pathways)
[1] "character"

So it seems like all of these data structures are at least a little bit different. The pathview vignette has the original data from a data.frame object, but the example I ran from the documentation uses a matrix. If the issue is that my implementation and Dylan's implementations use data.frames rather than matrix, then I am surprised that Dylan's implementation worked before but not now. Perhaps something changed with either pathview's implementation, or something more general to R, or something with related packages/dependencies. But the distinct data types motivates me to try using a matrix type. We can attempt to cast the type using data.matrix.

I attempted to convert the data.frame object to matrix using data.matrix, but still the boxes were not colored at all. Here are the types of the resulting kegg_LFC:

> str(kegg_LFC)
num [1:4934, 1:4] 2.389 0 0 0.673 0 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:4934] "CMA001C" "CMA004C" "CMA005C" "CMA007C" ...
..$ : chr [1:4] "1h_1" "2h_2" "24h_3" "49h_4"
> class(kegg_LFC)
[1] "matrix" "array" 
> type(kegg_LFC)
[1] "double"
> typeof(kegg_LFC)
[1] "double"

So I have apparently succeeded in converting the to matrix, but the coloring is still blank. So I have not identified the problem yet. Here are a couple of directions to take this.

The first could be that there is something wrong with the names in pathways. Maybe running a single example without using the sapply function, as suggested [[Dylan Fossl]] suggested that I try.

Using the same instance where I had converted to matrix class, I tried running the command on pathways[1].

> pathview(
+     gene.data = kegg_LFC,
+     pathway.id = pathways[1],
+     species = "cme",
+     gene.idtype = "KEGG",
+     limit = c(min(kegg_LFC), max(kegg_LFC)),
+     keys.align = "y",
+     kegg.native = T,
+     match.data = F,
+     multi.state = T,
+     same.layer = T,
+     plot.col.key=FALSE
+ )
Warning: None of the genes or compounds mapped to the pathway!
Argument gene.idtype or cpd.idtype may be wrong.
Info: Working in directory /home/galen/Dropbox/UNBC/Rader/RaderLabCode/src
Info: Writing image file cme00010.pathview.png
Warning message:
In colnames(plot.data)[c(1, 3, 9:ncs)] <- c("kegg.names", "all.mapped",  :
  number of items to replace is not a multiple of replacement length

Specifically, that warning is new:

Warning message:
In colnames(plot.data)[c(1, 3, 9:ncs)] <- c("kegg.names", "all.mapped",  :
  number of items to replace is not a multiple of replacement length

Maybe this is somehow related to that session having used data.matrix on kegg_LFC, so let us look at running just that first pathway without being confounded by the type conversion.

The second is there could be something wrong with the gene names in kegg_LFC. The genes in the pathview example are actually numbers rather than codes that have names in them.

A third, weirder idea: Should we be assigning kegg_LFC <- t_dfs -1? If this is backwards from what is should be, maybe blank color is what we get with negated values? This doesn't agree with the warning:

Warning: None of the genes or compounds mapped to the pathway!
Argument gene.idtype or cpd.idtype may be wrong.

which suggests to us that there is something wrong with the gene.idtype.

What is gene.idtype exactly? I guess this constitutes the type of string format that pathview expects the genes to be in. [[Dylan Fossl]]'s code uses gene.idtype="KEGG". The Pathview vignette says:

For details check: gene.idtype.list and data(rn.list); names(rn.list).

Checking gene.idtype.list I find:

> gene.idtype.list
 [1] "SYMBOL"       "GENENAME"     "ENSEMBL"      "ENSEMBLPROT"  "UNIGENE"      "UNIPROT"     
 [7] "ACCNUM"       "ENSEMBLTRANS" "REFSEQ"       "ENZYME"       "TAIR"         "PROSITE"     
[13] "ORF"

"KEGG" is not among these options, although I see from this post that [[Dylan Fossl]] is not the only one to use this. The Common Errors part of the vignette says:

mismatch between the IDs for gene.data (or cpd.data) and gene.idtype (or cpd.idtype). For example, gene.data or cpd.data uses some extern ID types, while gene.idtype = "entrez" and cpd.idtype = "kegg" (default).

This suggests to me that I should look at trying some different inputs for gene.idtype.

Another potential problem highlighted in this vignette is:

pathway.id wrong or wrong format, right format should be a five digit number, like 04110, 00620 etc.

The current code does not use these 5-number codes per se, but rather ours look like this:

> pathways[1:5]
[1] "cme00010" "cme00020" "cme00030" "cme00040" "cme00051"

which have 5 numbers at the end but all have the three letters "cme" at the front. Let us try replacing pathway.id=pid with pathway.id=substr(pid, 4, 4+5) and see what it does. This did not work, still giving Warning: None of the genes or compounds mapped to the pathway!. This post makes me wonder if I need to setup running all combinations of inputs up to a restricted generating set.

A tempting approach would be to keep pathway.id=pid and set gene.idtype="SYMBOL", but this leads to an error suggesting that this is not the right approach.


Error in pathview(gene.data = kegg_LFC, pathway.id = pid, species = "cme",  : 
  No proper gene annotation package available!
In addition: Warning messages:
1: In DESeqDataSet(se, design = design, ignoreRank) :
  some variables in design formula are characters, converting to factors
2: In DESeqDataSet(se, design = design, ignoreRank) :
  some variables in design formula are characters, converting to factors
3: In DESeqDataSet(se, design = design, ignoreRank) :
  some variables in design formula are characters, converting to factors
4: In DESeqDataSet(se, design = design, ignoreRank) :

 Error in pathview(gene.data = kegg_LFC, pathway.id = pid, species = "cme", : 
No proper gene annotation package available!

galenseilis commented 1 year ago

Note to self: run debugger to see what is happening line-by-line.

galenseilis commented 1 year ago

Here is a traceback

Error in pathview(gene.data = kegg_LFC, pathway.id = pid, species = "cme", : 
No proper gene annotation package available!
6.
stop("No proper gene annotation package available!")
5.
pathview(gene.data = kegg_LFC, pathway.id = pid, species = "cme", 
gene.idtype = "SYMBOL", limit = c(min(kegg_LFC), max(kegg_LFC)), 
keys.align = "y", kegg.native = T, match.data = F, multi.state = T, 
same.layer = T, plot.col.key = FALSE)
4.
FUN(X[[i]], ...)
3.
lapply(X = X, FUN = FUN, ...)
2.
sapply(pathways, function(pid) pathview(gene.data = kegg_LFC, 
pathway.id = pid, species = "cme", gene.idtype = "SYMBOL", 
limit = c(min(kegg_LFC), max(kegg_LFC)), keys.align = "y", 
kegg.native = T, match.data = F, multi.state = T, same.layer = T, ...
1.
sapply(pathways, function(pid) pathview(gene.data = kegg_LFC, 
pathway.id = pid, species = "cme", gene.idtype = "SYMBOL", 
limit = c(min(kegg_LFC), max(kegg_LFC)), keys.align = "y", 
kegg.native = T, match.data = F, multi.state = T, same.layer = T, ...

Says the source is not available.

But from that same namespace view I can see the source:

function (gene.data = NULL, cpd.data = NULL, pathway.id, species = "hsa", 
  kegg.dir = ".", cpd.idtype = "kegg", gene.idtype = "entrez", 
  gene.annotpkg = NULL, min.nnodes = 3, kegg.native = TRUE, 
  map.null = TRUE, expand.node = FALSE, split.group = FALSE, 
  map.symbol = TRUE, map.cpdname = TRUE, node.sum = "sum", 
  discrete = list(gene = FALSE, cpd = FALSE), limit = list(gene = 1, 
    cpd = 1), bins = list(gene = 10, cpd = 10), both.dirs = list(gene = T, 
    cpd = T), trans.fun = list(gene = NULL, cpd = NULL), 
  low = list(gene = "green", cpd = "blue"), mid = list(gene = "gray", 
    cpd = "gray"), high = list(gene = "red", cpd = "yellow"), 
  na.col = "transparent", ...) 
{
  dtypes = !is.null(gene.data) + !is.null(cpd.data)
  cond0 = dtypes == 1 & is.numeric(limit) & length(limit) > 
    1
  if (cond0) {
    if (limit[1] != limit[2] & is.null(names(limit))) 
      limit = list(gene = limit[1:2], cpd = limit[1:2])
  }
  if (is.null(trans.fun)) 
    trans.fun = list(gene = NULL, cpd = NULL)
  arg.len2 = c("discrete", "limit", "bins", "both.dirs", "trans.fun", 
    "low", "mid", "high")
  for (arg in arg.len2) {
    obj1 = eval(as.name(arg))
    if (length(obj1) == 1) 
      obj1 = rep(obj1, 2)
    if (length(obj1) > 2) 
      obj1 = obj1[1:2]
    obj1 = as.list(obj1)
    ns = names(obj1)
    if (length(ns) == 0 | !all(c("gene", "cpd") %in% ns)) 
      names(obj1) = c("gene", "cpd")
    assign(arg, obj1)
  }
  if (is.character(gene.data)) {
    gd.names = gene.data
    gene.data = rep(1, length(gene.data))
    names(gene.data) = gd.names
    both.dirs$gene = FALSE
    ng = length(gene.data)
    nsamp.g = 1
  }
  else if (!is.null(gene.data)) {
    if (length(dim(gene.data)) == 2) {
      gd.names = rownames(gene.data)
      ng = nrow(gene.data)
      nsamp.g = 2
    }
    else if (is.numeric(gene.data) & is.null(dim(gene.data))) {
      gd.names = names(gene.data)
      ng = length(gene.data)
      nsamp.g = 1
    }
    else stop("wrong gene.data format!")
  }
  else if (is.null(cpd.data)) {
    stop("gene.data and cpd.data are both NULL!")
  }
  gene.idtype = toupper(gene.idtype)
  data(bods)
  if (species != "ko") {
    species.data = kegg.species.code(species, na.rm = T, 
      code.only = FALSE)
  }
  else {
    species.data = c(kegg.code = "ko", entrez.gnodes = "0", 
      kegg.geneid = "K01488", ncbi.geneid = NA, ncbi.proteinid = NA, 
      uniprot = NA)
    gene.idtype = "KEGG"
    msg.fmt = "Only KEGG ortholog gene ID is supported, make sure it looks like \"%s\"!"
    msg = sprintf(msg.fmt, species.data["kegg.geneid"])
    message("Note: ", msg)
  }
  if (length(dim(species.data)) == 2) {
    message("Note: ", "More than two valide species!")
    species.data = species.data[1, ]
  }
  species = species.data["kegg.code"]
  entrez.gnodes = species.data["entrez.gnodes"] == 1
  if (is.na(species.data["ncbi.geneid"])) {
    if (!is.na(species.data["kegg.geneid"])) {
      msg.fmt = "Mapping via KEGG gene ID (not Entrez) is supported for this species,\nit looks like \"%s\"!"
      msg = sprintf(msg.fmt, species.data["kegg.geneid"])
      message("Note: ", msg)
    }
    else {
      stop("This species is not annotated in KEGG!")
    }
  }
  if (is.null(gene.annotpkg)) 
    gene.annotpkg = bods[match(species, bods[, 3]), 1]
  if (length(grep("ENTREZ|KEGG|NCBIPROT|UNIPROT", gene.idtype)) < 
    1 & !is.null(gene.data)) {
    if (is.na(gene.annotpkg)) 
      stop("No proper gene annotation package available!")
    if (!gene.idtype %in% gene.idtype.bods[[species]]) 
      stop("Wrong input gene ID type!")
    gene.idmap = id2eg(gd.names, category = gene.idtype, 
      pkg.name = gene.annotpkg, unique.map = F)
    gene.data = mol.sum(gene.data, gene.idmap)
    gene.idtype = "ENTREZ"
  }
  if (gene.idtype != "KEGG" & !entrez.gnodes & !is.null(gene.data)) {
    id.type = gene.idtype
    if (id.type == "ENTREZ") 
      id.type = "ENTREZID"
    kid.map = names(species.data)[-c(1:2)]
    kid.types = names(kid.map) = c("KEGG", "ENTREZID", "NCBIPROT", 
      "UNIPROT")
    kid.map2 = gsub("[.]", "-", kid.map)
    kid.map2["UNIPROT"] = "up"
    if (is.na(kid.map[id.type])) 
      stop("Wrong input gene ID type for the species!")
    message("Info: Getting gene ID data from KEGG...")
    gene.idmap = keggConv(kid.map2[id.type], species)
    message("Info: Done with data retrieval!")
    kegg.ids = gsub(paste(species, ":", sep = ""), "", names(gene.idmap))
    in.ids = gsub(paste0(kid.map2[id.type], ":"), "", gene.idmap)
    gene.idmap = cbind(in.ids, kegg.ids)
    gene.data = mol.sum(gene.data, gene.idmap)
    gene.idtype = "KEGG"
  }
  if (is.character(cpd.data)) {
    cpdd.names = cpd.data
    cpd.data = rep(1, length(cpd.data))
    names(cpd.data) = cpdd.names
    both.dirs$cpd = FALSE
    ncpd = length(cpd.data)
  }
  else if (!is.null(cpd.data)) {
    if (length(dim(cpd.data)) == 2) {
      cpdd.names = rownames(cpd.data)
      ncpd = nrow(cpd.data)
    }
    else if (is.numeric(cpd.data) & is.null(dim(cpd.data))) {
      cpdd.names = names(cpd.data)
      ncpd = length(cpd.data)
    }
    else stop("wrong cpd.data format!")
  }
  if (length(grep("kegg", cpd.idtype)) < 1 & !is.null(cpd.data)) {
    data(rn.list)
    cpd.types = c(names(rn.list), "name")
    cpd.types = tolower(cpd.types)
    cpd.types = cpd.types[-grep("kegg", cpd.types)]
    if (!tolower(cpd.idtype) %in% cpd.types) 
      stop("Wrong input cpd ID type!")
    cpd.idmap = cpd2kegg(cpdd.names, in.type = cpd.idtype)
    cpd.data = mol.sum(cpd.data, cpd.idmap)
  }
  warn.fmt = "Parsing %s file failed, please check the file!"
  if (length(grep(species, pathway.id)) > 0) {
    pathway.name = pathway.id
    pathway.id = gsub(species, "", pathway.id)
  }
  else pathway.name = paste(species, pathway.id, sep = "")
  kfiles = list.files(path = kegg.dir, pattern = "[.]xml|[.]png")
  npath = length(pathway.id)
  out.list = list()
  tfiles.xml = paste(pathway.name, "xml", sep = ".")
  tfiles.png = paste(pathway.name, "png", sep = ".")
  if (kegg.native) 
    ttype = c("xml", "png")
  else ttype = "xml"
  xml.file <- paste(kegg.dir, "/", tfiles.xml, sep = "")
  for (i in 1:npath) {
    if (kegg.native) 
      tfiles = c(tfiles.xml[i], tfiles.png[i])
    else tfiles = tfiles.xml[i]
    if (!all(tfiles %in% kfiles)) {
      dstatus = download.kegg(pathway.id = pathway.id[i], 
        species = species, kegg.dir = kegg.dir, file.type = ttype)
      if (dstatus == "failed") {
        warn.fmt = "Failed to download KEGG xml/png files, %s skipped!"
        warn.msg = sprintf(warn.fmt, pathway.name[i])
        message("Warning: ", warn.msg)
        return(invisible(0))
      }
    }
    if (kegg.native) {
      node.data = try(node.info(xml.file[i]), silent = T)
      if (class(node.data)[1] == "try-error") {
        warn.msg = sprintf(warn.fmt, xml.file[i])
        message("Warning: ", warn.msg)
        return(invisible(0))
      }
      node.type = c("gene", "enzyme", "compound", "ortholog")
      sel.idx = node.data$type %in% node.type
      nna.idx = !is.na(node.data$x + node.data$y + node.data$width + 
        node.data$height)
      sel.idx = sel.idx & nna.idx
      if (sum(sel.idx) < min.nnodes) {
        warn.fmt = "Number of mappable nodes is below %d, %s skipped!"
        warn.msg = sprintf(warn.fmt, min.nnodes, pathway.name[i])
        message("Warning: ", warn.msg)
        return(invisible(0))
      }
      node.data = lapply(node.data, "[", sel.idx)
    }
    else {
      gR1 = try(parseKGML2Graph2(xml.file[i], genes = F, 
        expand = expand.node, split.group = split.group), 
        silent = T)
      node.data = try(node.info(gR1), silent = T)
      if (class(node.data)[1] == "try-error") {
        warn.msg = sprintf(warn.fmt, xml.file[i])
        message("Warning: ", warn.msg)
        return(invisible(0))
      }
    }
    if (species == "ko") 
      gene.node.type = "ortholog"
    else gene.node.type = "gene"
    if ((!is.null(gene.data) | map.null) & sum(node.data$type == 
      gene.node.type) > 1) {
      plot.data.gene = node.map(gene.data, node.data, 
        node.types = gene.node.type, node.sum = node.sum, 
        entrez.gnodes = entrez.gnodes)
      kng = plot.data.gene$kegg.names
      kng.char = gsub("[0-9]", "", unlist(kng))
      if (any(kng.char > "")) 
        entrez.gnodes = FALSE
      if (map.symbol & species != "ko" & entrez.gnodes) {
        if (is.na(gene.annotpkg)) {
          warn.fmt = "No annotation package for the species %s, gene symbols not mapped!"
          warn.msg = sprintf(warn.fmt, species)
          message("Warning: ", warn.msg)
        }
        else {
          plot.data.gene$labels = eg2id(as.character(plot.data.gene$kegg.names), 
            category = "SYMBOL", pkg.name = gene.annotpkg)[, 
            2]
          mapped.gnodes = rownames(plot.data.gene)
          node.data$labels[mapped.gnodes] = plot.data.gene$labels
        }
      }
      cols.ts.gene = node.color(plot.data.gene, limit$gene, 
        bins$gene, both.dirs = both.dirs$gene, trans.fun = trans.fun$gene, 
        discrete = discrete$gene, low = low$gene, mid = mid$gene, 
        high = high$gene, na.col = na.col)
    }
    else plot.data.gene = cols.ts.gene = NULL
    if ((!is.null(cpd.data) | map.null) & sum(node.data$type == 
      "compound") > 1) {
      plot.data.cpd = node.map(cpd.data, node.data, node.types = "compound", 
        node.sum = node.sum)
      if (map.cpdname & !kegg.native) {
        plot.data.cpd$labels = cpdkegg2name(plot.data.cpd$labels)[, 
          2]
        mapped.cnodes = rownames(plot.data.cpd)
        node.data$labels[mapped.cnodes] = plot.data.cpd$labels
      }
      cols.ts.cpd = node.color(plot.data.cpd, limit$cpd, 
        bins$cpd, both.dirs = both.dirs$cpd, trans.fun = trans.fun$cpd, 
        discrete = discrete$cpd, low = low$cpd, mid = mid$cpd, 
        high = high$cpd, na.col = na.col)
    }
    else plot.data.cpd = cols.ts.cpd = NULL
    if (kegg.native) {
      pv.pars = keggview.native(plot.data.gene = plot.data.gene, 
        cols.ts.gene = cols.ts.gene, plot.data.cpd = plot.data.cpd, 
        cols.ts.cpd = cols.ts.cpd, node.data = node.data, 
        pathway.name = pathway.name[i], kegg.dir = kegg.dir, 
        limit = limit, bins = bins, both.dirs = both.dirs, 
        discrete = discrete, low = low, mid = mid, high = high, 
        na.col = na.col, ...)
    }
    else {
      pv.pars = keggview.graph(plot.data.gene = plot.data.gene, 
        cols.ts.gene = cols.ts.gene, plot.data.cpd = plot.data.cpd, 
        cols.ts.cpd = cols.ts.cpd, node.data = node.data, 
        path.graph = gR1, pathway.name = pathway.name[i], 
        map.cpdname = map.cpdname, split.group = split.group, 
        limit = limit, bins = bins, both.dirs = both.dirs, 
        discrete = discrete, low = low, mid = mid, high = high, 
        na.col = na.col, ...)
    }
    plot.data.gene = cbind(plot.data.gene, cols.ts.gene)
    if (!is.null(plot.data.gene)) {
      cnames = colnames(plot.data.gene)[-(1:8)]
      nsamp = length(cnames)/2
      if (nsamp > 1) {
        cnames[(nsamp + 1):(2 * nsamp)] = paste(cnames[(nsamp + 
          1):(2 * nsamp)], "col", sep = ".")
      }
      else cnames[2] = "mol.col"
      colnames(plot.data.gene)[-(1:8)] = cnames
    }
    plot.data.cpd = cbind(plot.data.cpd, cols.ts.cpd)
    if (!is.null(plot.data.cpd)) {
      cnames = colnames(plot.data.cpd)[-(1:8)]
      nsamp = length(cnames)/2
      if (nsamp > 1) {
        cnames[(nsamp + 1):(2 * nsamp)] = paste(cnames[(nsamp + 
          1):(2 * nsamp)], "col", sep = ".")
      }
      else cnames[2] = "mol.col"
      colnames(plot.data.cpd)[-(1:8)] = cnames
    }
    out.list[[i]] = list(plot.data.gene = plot.data.gene, 
      plot.data.cpd = plot.data.cpd)
  }
  if (npath == 1) 
    out.list = out.list[[1]]
  else names(out.list) = pathway.name
  return(invisible(out.list))
}

I don't understand in what sense it is not available if... it is clearly available.

All of these options terminate the debugging. I thought I would be able to step inside the function pathview, but apparently not.

galenseilis commented 1 year ago

Since stop("No proper gene annotation package available!") was called, that implies (is.na(gene.annotpkg)) was TRUE. The is.na would return true if no value for gene.annotpkg was available.

This snippet

if (is.na(gene.annotpkg)) 
    stop("No proper gene annotation package available!")

is nested within

if (length(grep("ENTREZ|KEGG|NCBIPROT|UNIPROT", gene.idtype)) < 
    1 & !is.null(gene.data)) {
  if (is.na(gene.annotpkg)) 
    stop("No proper gene annotation package available!")
  if (!gene.idtype %in% gene.idtype.bods[[species]]) 
    stop("Wrong input gene ID type!")
  gene.idmap = id2eg(gd.names, category = gene.idtype, 
                     pkg.name = gene.annotpkg, unique.map = F)
  gene.data = mol.sum(gene.data, gene.idmap)
  gene.idtype = "ENTREZ"
}

and thus (length(grep("ENTREZ|KEGG|NCBIPROT|UNIPROT", gene.idtype)) < 1 & !is.null(gene.data)) is also true. The second condition !is.null(gene.data) should be met by the fact that we passed in kegg_LFC. The first condition length(grep("ENTREZ|KEGG|NCBIPROT|UNIPROT", gene.idtype)) < 1 is more interesting. The grep command will return 1 if 'KEGG' is in the gene.idtype argument.

The current code I ran is this:

source("NewFullRNAseqPipeline_metadata.R")

pathways <- scan("../Annotations/Annotations/kegg_pathways.csv", character())
pathways <- pathways[pathways != 'cme01100' & pathways != 'cme01110'] # Exclude these larger diagrams

timepoints <- unique(coldata$timepoint)

t_dfs <- data.frame()
c <- 0
for (t in timepoints) {
  c <- c + 1
  print(t)
  t_col <- coldata[coldata$timepoint == t,]
  t_cts <- cts[grepl(t, colnames(cts))]
  dds <-DESeqDataSetFromMatrix( countData = t_cts,
                                colData = t_col,
                                design = configDesign)
  dds <- DESeq(dds)
  res <- results(dds)
  t_res <- as.data.frame(res)[c("log2FoldChange", "padj")]

  t_res$log2FoldChange <- ifelse(t_res$padj < 0.01, t_res$log2FoldChange, 0)
  t_res <- t_res['log2FoldChange']
  names(t_res)[names(t_res) == 'log2FoldChange'] <- paste(t, c, sep='_')
  if (nrow(t_dfs) == 0){
    t_dfs <- t_res
  } else {
    t_dfs = merge(t_dfs, t_res, by="row.names", all=TRUE)
    rownames(t_dfs) <- t_dfs$Row.names
    t_dfs <- subset(t_dfs, select=-c(Row.names))
  }
  }

kegg_LFC <- t_dfs * -1
kegg_LFC <- na.omit(kegg_LFC)
kegg_LFC <- kegg_LFC
pv.out.list <- sapply(pathways,
                      function(pid) pathview(
                        gene.data = kegg_LFC,
                        pathway.id = pid,
                        species = "cme",
                        gene.idtype = "SYMBOL",
                        limit = c(min(kegg_LFC), max(kegg_LFC)),
                        keys.align = "y",
                        kegg.native = T,
                        match.data = F,
                        multi.state = T,
                        same.layer = T,
                        plot.col.key=FALSE
                        )
                      )

Notably, I didn't specify gene.idtype even though when I first post this thread I did have that argument. Ah yes, I must have forgotten to revert to gene.idtype="KEGG". Rerunning I do not get any errors, but still the Warning: None of the genes or compounds mapped to the pathway! warning.

galenseilis commented 1 year ago

Let's chase down that Warning: None of the genes or compounds mapped to the pathway! in the source code. I can download the source from here.

Running /pathview$ grep -rin "Warning:" */* I see:

R/combineKEGGnodes.R:27:      message("Warning: ", warn.msg)
R/download.kegg.R:41:          message("Warning: ", warn.msg)
R/download.kegg.R:58:          message("Warning: ", warn.msg)
R/keggview.graph.R:51:      message("Warning: reconcile groups sharing member nodes!")
R/node.map.R:45:      message("Warning: ", paste("None of the genes or compounds mapped to the pathway!",
R/pathview.R:189:        message("Warning: ", warn.msg)
R/pathview.R:198:      message("Warning: ", warn.msg)
R/pathview.R:208:      message("Warning: ", warn.msg)
R/pathview.R:217:      message("Warning: ", warn.msg)
R/pathview.R:233:                message("Warning: ", warn.msg)

The line R/node.map.R:45: message("Warning: ", paste("None of the genes or compounds mapped to the pathway!", suggests I should start at line 45 of node.map.R.

The complete expression is spans lines 45 and 46:

message("Warning: ", paste("None of the genes or compounds mapped to the pathway!",
                    "Argument gene.idtype or cpd.idtype may be wrong.", sep="\n"))

In order for that warning to print, it must be true that (length(mapped.mols)==0). The variable mapped.mols is defined by

mapped.mols <- intersect(unlist(node.data$kegg.names), row.names(mol.data))

on line 43 of the same file. The intersect command likely comes from base R, which performs a set intersection. This intersection is empty for some reason since its length is zero. The unlist command takes a list structure and simplifies it into a more vector-like structure. This brings us to understanding node.data$kegg.names and mol.data. The variable node.data is a parameter of the node.map function:

node.map <-
function(mol.data=NULL, node.data, node.types=c("gene", "ortholog", "compound")[1], node.sum =c("sum","mean", "median", "max", "max.abs", "random")[1], entrez.gnodes=TRUE){
type.sel=node.data$type %in% node.types
if(sum(type.sel)<1){
  message("Note: ", "No specified node types in the pathway!")
  plot.data=NULL
  return(plot.data)
}
node.data=lapply(node.data, "[", type.sel)
n.nodes=length(node.data$kegg.names)
spacials=as.matrix(as.data.frame(node.data[c("type", "x", "y", "width", "height")]))
if(node.types[1]=="gene"){
kng=node.data$kegg.names[node.data$type=="gene"]
kng.char=gsub("[0-9]", "", unlist(kng))
if(any(kng.char>"")) entrez.gnodes=FALSE
}

na.plot.data=function(){
    sapply(1:n.nodes, function(i){
      kns=node.data$kegg.names[[i]]
    if(node.types[1]=="gene" & entrez.gnodes) items=as.numeric(kns)
    else items=kns
    ord=order(items)
    items=items[ord]
     kns=kns[ord]
      return(c(kns[1],"", spacials[i,], NA))
    })
  }

if(is.null(mol.data)){
  plot.data=na.plot.data()
} else{

#map gene data  
    if(is.character(mol.data)){
      gd.names=mol.data
      mol.data=rep(1, length(mol.data))
      names(mol.data)=gd.names
    }
    mol.data=cbind(mol.data)

    if(is.null(colnames(mol.data))) colnames(mol.data)=paste("ge", 1:ncol(mol.data),sep="")
    mapped.mols <- intersect(unlist(node.data$kegg.names), row.names(mol.data))
    if(length(mapped.mols)==0){
      message("Warning: ", paste("None of the genes or compounds mapped to the pathway!",
                    "Argument gene.idtype or cpd.idtype may be wrong.", sep="\n"))
      plot.data=na.plot.data()
    } else{
      if(node.types[1]=="gene" & entrez.gnodes) mapped.mols =as.numeric(mapped.mols)

      plot.data=sapply(1:n.nodes, function(i){
        kns=node.data$kegg.names[[i]]
        if(node.types[1]=="gene" & entrez.gnodes) items=as.numeric(kns)
        else items=kns
        ord=order(items)
        items=items[ord]
        kns=kns[ord]
        hit=items %in% mapped.mols 
        if(sum(hit)==0) {
          return(c(kns[1], "", spacials[i,], rep(NA, ncol(mol.data))))
        } else if(sum(hit)==1) {
          edata=mol.data[as.character(items[hit]),]
          return(c(kns[hit], kns[hit], spacials[i,], edata))
        } else {
          node.sum=eval(as.name(node.sum))
                                        #      edata=apply(cbind(mol.data[as.character(items[hit]),]), 2, node.sum, na.rm=T)
#          edata=apply(cbind(mol.data[as.character(items[hit]),]), 2, function(x){
          edata=apply(mol.data[as.character(items[hit]),,drop=F], 2, function(x){
            x=x[!is.na(x)]
            if(length(x)<1) return(NA)
            else return(node.sum(x, na.rm=F))
          })
          return(c(kns[hit][1], paste(kns[hit],collapse=","), spacials[i,], edata))
        }    
      })
    }
  }

colnames(plot.data)=names(node.data$kegg.names)
plot.data=as.data.frame(t(plot.data), stringsAsFactors = F)
  plot.data$labels=node.data$labels
  ncs=ncol(plot.data)
  plot.data=plot.data[,c(1,ncs,2:(ncs-1))]
if(is.null(mol.data)) cns="mol.data" else cns=colnames(mol.data)
colnames(plot.data)[c(1,3,9:ncs)]=c("kegg.names","all.mapped",cns)#c(1,8:ncs)
for(ic in (1:ncol(plot.data))[-c(1:4)]) plot.data[,ic]=as.numeric(plot.data[,ic])#-c(1:3)

return(plot.data)
}

So to further understand node.data we will have to find out where node.map is called in the code base, and what is the initial input. The same can be said of mol.data. We can return to the logic within node.map later.

Using grep again, we find:

pathview$ grep -rin "node.map" */*
man/cpdidmap.Rd:76:  \code{\link{node.map}} the node data mapper function.
man/eg2id.Rd:110:  \code{\link{node.map}} the node data mapper function.
man/mol.sum.Rd:77:  \code{\link{node.map}} the node data mapper function.
man/node.color.Rd:28:  the result returned by \code{node.map} function. It is a data.frame composed
man/node.color.Rd:31:  parsed or mapped node data. Check \code{node.map} for details.
man/node.color.Rd:126:node.map function into pseudo colors, which then can be plotted on the
man/node.color.Rd:149:  \code{\link{node.map}} the node data mapper function.
man/node.color.Rd:156:plot.data.gene=node.map(mol.data=gse16873.d[,1], node.data,
man/node.map.Rd:1:\name{node.map}
man/node.map.Rd:2:\alias{node.map}
man/node.map.Rd:12:node.map(mol.data = NULL, node.data, node.types = c("gene", "ortholog",
man/node.map.Rd:49:Mapper function node.map maps user supplied molecular data to KEGG
man/node.map.Rd:107:plot.data.gene=node.map(mol.data=gse16873.d[,1], node.data,
man/pathview.Rd:216:data.frame returned by node.map function for rendering mapped gene
man/pathview.Rd:411:  \code{\link{node.map}} and \code{\link{node.color}} the mapper. 
man/sim.mol.data.Rd:94:  \code{\link{node.map}} the node data mapper function.
R/node.map.R:1:node.map <-
R/pathview.R:225:    plot.data.gene=node.map(gene.data, node.data, node.types=gene.node.type, node.sum=node.sum, entrez.gnodes=entrez.gnodes)
R/pathview.R:245:            plot.data.cpd=node.map(cpd.data, node.data, node.types="compound", node.sum=node.sum)

The *.Rd files are a kind of documentation file. Let us repeat for R files only.

/pathview$ grep -rin "node.map" */*.R
R/node.map.R:1:node.map <-
R/pathview.R:225:    plot.data.gene=node.map(gene.data, node.data, node.types=gene.node.type, node.sum=node.sum, entrez.gnodes=entrez.gnodes)
R/pathview.R:245:            plot.data.cpd=node.map(cpd.data, node.data, node.types="compound", node.sum=node.sum)

Because we are interested in gene data, it makes sense to investigate line of 225 rather than line 245 of pathview.R. The line plot.data.gene=node.map(gene.data, node.data, node.types=gene.node.type, node.sum=node.sum, entrez.gnodes=entrez.gnodes) is nested within an if statement.

if((!is.null(gene.data) |map.null) & sum(node.data$type==gene.node.type)>1){
    plot.data.gene=node.map(gene.data, node.data, node.types=gene.node.type, node.sum=node.sum, entrez.gnodes=entrez.gnodes)
    kng=plot.data.gene$kegg.names
    kng.char=gsub("[0-9]", "", unlist(kng))
    if(any(kng.char>"")) entrez.gnodes=FALSE
    if(map.symbol & species!="ko" & entrez.gnodes) {
              if(is.na(gene.annotpkg)) {
                warn.fmt="No annotation package for the species %s, gene symbols not mapped!"
                warn.msg=sprintf(warn.fmt, species)
                message("Warning: ", warn.msg)
              } else {
#                browser()
                plot.data.gene$labels=eg2id(as.character(plot.data.gene$kegg.names), category="SYMBOL", pkg.name=gene.annotpkg)[,2]
                mapped.gnodes=rownames(plot.data.gene)
                node.data$labels[mapped.gnodes]=plot.data.gene$labels
              }
            }
            cols.ts.gene=node.color(plot.data.gene, limit$gene, bins$gene, both.dirs=both.dirs$gene, trans.fun=trans.fun$gene, discrete=discrete$gene, low=low$gene, mid=mid$gene, high=high$gene,  na.col=na.col)
          }

Thus ((!is.null(gene.data) |map.null) & sum(node.data$type==gene.node.type)>1) must be true for our inputs. The node.data variable is assigned from try(node.info(xml.file[i]), silent=T). We can in turn look for node.info in the code base using grep.

/pathview$ grep -rin "node.info" */*.R
R/node.info.R:1:node.info <-
R/pathview.R:195:    node.data=try(node.info(xml.file[i]), silent=T)
R/pathview.R:214:    node.data=try(node.info(gR1), silent=T)

So node.info is this function:

node.info <-
function(object, short.name=TRUE){
  cobj=class(object)[1]
  if(cobj=="character"){
    object <-parseKGML2(object)
    ndata=nodes(object)    
  } else if(cobj=="KEGGPathway"){
    ndata=nodes(object)
  } else  if(cobj=="graphNEL"){
    ndata=getKEGGnodeData(object)
  }  else {
    stop("object should be either a filename, KEGGPathway, or graphNEL!")
  }

  nodeNames=sapply(ndata, getName)
  nodeType=sapply(ndata, getType)
  nodeComp=sapply(ndata, getComponent)
  node.size=sapply(nodeComp, length)

  grs1=sapply(ndata, function(x){
    grs=x@graphics
    c(labels=grs@name, shape=grs@type)
  })
  grs2=sapply(ndata, function(x){
    grs=x@graphics
    c(x=grs@x,y=grs@y,width=grs@width,height=grs@height)
  })
  grs1=t(grs1)
  grs2=t(grs2)

  graphic.data=as.list(cbind(data.frame(grs1, stringsAsFactors=F), data.frame(grs2)))
  nd.list=list(kegg.names=nodeNames, type=nodeType, component=nodeComp, size=node.size)
  nd.list=c(nd.list, graphic.data)
  if(short.name){
    gnames=sapply(strsplit(nd.list$labels, ", "), "[[", 1)
    map.idx=nd.list$type=="map"
    gnames[map.idx]=nd.list$labels[map.idx]
    gnames[is.na(gnames)]=""
    gnames=gsub("[.][.][.]", "", gnames)
    nd.list$labels=gnames
    nd.list$kegg.names=lapply(nd.list$kegg.names, function(x) gsub("^.*:", "", x))
  }

  nn=names(nodeNames)
  nd.list= lapply(nd.list, function(x) {
    names(x) <- nn
    return(x)
  })
  return(nd.list)
}

This is starting to multifurcate into many branches. This approach is more likely to succeed with diagrams.

galenseilis commented 1 year ago

Looking inside pv.out.list, I can see that there is a mapped column containing empty strings.

> pv.out.list[1]$cme00010$plot.data.gene$kegg.names
 [1] "CYME_CME145C" "CYME_CMO345C" "CYME_CMM296C" "CYME_CMS125C" "CYME_CMS327C" "CYME_CMS327C"
 [7] "CYME_CMI273C" "CYME_CMA145C" "CYME_CMA030C" "CYME_CMK131C" "CYME_CMJ250C" "CYME_CMQ172C"
[13] "CYME_CMI162C" "CYME_CMD041C" "CYME_CMO124C" "CYME_CMJ272C" "CYME_CMO124C" "CYME_CMO276C"
[19] "CYME_CMO124C" "CYME_CMD040C" "CYME_CMO276C" "CYME_CMM299C" "CYME_CMJ305C" "CYME_CMN285C"
[25] "CYME_CMT034C" "CYME_CMK098C" "CYME_CMK188C" "CYME_CMH052C" "CYME_CMF012C"
> pv.out.list[1]$cme00010$plot.data.gene$labels
 [1] "CYME_CME145C" "CYME_CMO345C" "CYME_CMM296C" "CYME_CMS125C" "CYME_CMS327C" "CYME_CMS327C"
 [7] "CYME_CMI273C" "CYME_CMA145C" "CYME_CMA030C" "CYME_CMK131C" "CYME_CMJ250C" "CYME_CMQ172C"
[13] "CYME_CMI162C" "CYME_CMD041C" "CYME_CMO124C" "CYME_CMJ272C" "CYME_CMO124C" "CYME_CMO276C"
[19] "CYME_CMO124C" "CYME_CMD040C" "CYME_CMO276C" "CYME_CMM299C" "CYME_CMJ305C" "CYME_CMN285C"
[25] "CYME_CMT034C" "CYME_CMK098C" "CYME_CMK188C" "CYME_CMH052C" "CYME_CMF012C"
> pv.out.list[1]$cme00010$plot.data.gene$all.mapped
 [1] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""

I also see that mol.col only contains #FFFFFF:

> pv.out.list[1]$cme00010$plot.data.gene$mol.col
 [1] "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF"
[10] "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF"
[19] "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF" "#FFFFFF"
[28] "#FFFFFF" "#FFFFFF"

which is the hexcode for completely white.

galenseilis commented 1 year ago

Also interesting that only the first time point is showing up:

> colnames(pv.out.list$cme00010$plot.data.gene)
 [1] "kegg.names" "labels"     "all.mapped" "type"       "x"          "y"          "width"     
 [8] "height"     "1h_1"       "mol.col"

galenseilis commented 1 year ago

It is interesting that in result which I used to store our case of pid=00010 I see the KEGG names are strings whereas the provided example in pv.out are integers. A future step may be to provide integers for kegg.names.

> result
$plot.data.gene
      kegg.names       labels all.mapped type   x   y width height 1h_1 mol.col
18  CYME_CME145C CYME_CME145C            gene 483 407    46     17   NA #FFFFFF
42  CYME_CMO345C CYME_CMO345C            gene 284 948    46     17   NA #FFFFFF
49  CYME_CMM296C CYME_CMM296C            gene 487 948    46     17   NA #FFFFFF
50  CYME_CMS125C CYME_CMS125C            gene 439 941    46     17   NA #FFFFFF
52  CYME_CMS327C CYME_CMS327C            gene 431 858    46     17   NA #FFFFFF
53  CYME_CMS327C CYME_CMS327C            gene 319 870    46     17   NA #FFFFFF
54  CYME_CMI273C CYME_CMI273C            gene 210 870    46     17   NA #FFFFFF
55  CYME_CMA145C CYME_CMA145C            gene 579 869    46     17   NA #FFFFFF
57  CYME_CMA030C CYME_CMA030C            gene 483 771    46     17   NA #FFFFFF
59  CYME_CMK131C CYME_CMK131C            gene 483 701    46     17   NA #FFFFFF
61  CYME_CMJ250C CYME_CMJ250C            gene 458 484    46     17   NA #FFFFFF
62  CYME_CMQ172C CYME_CMQ172C            gene 386 449    46     17   NA #FFFFFF
63  CYME_CMI162C CYME_CMI162C            gene 458 336    46     17   NA #FFFFFF
64  CYME_CMD041C CYME_CMD041C            gene 408 336    46     17   NA #FFFFFF
66  CYME_CMO124C CYME_CMO124C            gene 483 265    46     17   NA #FFFFFF
67  CYME_CMJ272C CYME_CMJ272C            gene 483 178    46     17   NA #FFFFFF
68  CYME_CMO124C CYME_CMO124C            gene 385 302    46     17   NA #FFFFFF
70  CYME_CMO276C CYME_CMO276C            gene 233 312    46     17   NA #FFFFFF
71  CYME_CMO124C CYME_CMO124C            gene 357 264    46     17   NA #FFFFFF
72  CYME_CMD040C CYME_CMD040C            gene 307 264    46     17   NA #FFFFFF
74  CYME_CMO276C CYME_CMO276C            gene 233 239    46     17   NA #FFFFFF
79  CYME_CMM299C CYME_CMM299C            gene 262 913    46     17   NA #FFFFFF
80  CYME_CMJ305C CYME_CMJ305C            gene 483 557    46     17   NA #FFFFFF
121 CYME_CMN285C CYME_CMN285C            gene 239 747    46     17   NA #FFFFFF
130 CYME_CMT034C CYME_CMT034C            gene 392 521    46     17   NA #FFFFFF
140 CYME_CMK098C CYME_CMK098C            gene  96 916    46     17   NA #FFFFFF
147 CYME_CMK188C CYME_CMK188C            gene 508 630    46     17   NA #FFFFFF
156 CYME_CMH052C CYME_CMH052C            gene 558 336    46     17   NA #FFFFFF
159 CYME_CMF012C CYME_CMF012C            gene 574 796    46     17   NA #FFFFFF

$plot.data.cpd
    kegg.names labels all.mapped     type   x   y width height mol.data mol.col
45      C00033 C00033            compound 146 958     8      8       NA #FFFFFF
87      C00031 C00031            compound 601 195     8      8       NA #FFFFFF
88      C00103 C00103            compound 483 143     8      8       NA #FFFFFF
89      C00631 C00631            compound 483 664     8      8       NA #FFFFFF
90      C00267 C00267            compound 181 228     8      8       NA #FFFFFF
91      C00221 C00221            compound 181 301     8      8       NA #FFFFFF
92      C00111 C00111            compound 332 448     8      8       NA #FFFFFF
93      C01172 C01172            compound 332 301     8      8       NA #FFFFFF
94      C00668 C00668            compound 483 230     8      8       NA #FFFFFF
95      C05345 C05345            compound 483 303     8      8       NA #FFFFFF
96      C00074 C00074            compound 483 736     8      8       NA #FFFFFF
97      C00197 C00197            compound 483 592     8      8       NA #FFFFFF
98      C00236 C00236            compound 483 520     8      8       NA #FFFFFF
99      C00186 C00186            compound 627 868     8      8       NA #FFFFFF
100     C15972 C15972            compound 314 911     8      8       NA #FFFFFF
101     C00469 C00469            compound 550 958     8      8       NA #FFFFFF
102     C00022 C00022            compound 483 868     8      8       NA #FFFFFF
103     C05125 C05125            compound 378 868     8      8       NA #FFFFFF
104     C00024 C00024            compound 146 868     8      8       NA #FFFFFF
105     C00084 C00084            compound 378 958     8      8       NA #FFFFFF
106     C16255 C16255            compound 265 868     8      8       NA #FFFFFF
107     C15973 C15973            compound 212 911     8      8       NA #FFFFFF
108     C05378 C05378            compound 483 373     8      8       NA #FFFFFF
109     C06186 C06186            compound 132 359     8      8       NA #FFFFFF
110     C06187 C06187            compound 239 359     8      8       NA #FFFFFF
111     C06188 C06188            compound 239 385     8      8       NA #FFFFFF
112     C01451 C01451            compound 132 385     8      8       NA #FFFFFF
118     C00036 C00036            compound 146 736     8      8       NA #FFFFFF
134     C01159 C01159            compound 568 556     8      8       NA #FFFFFF
139     C00118 C00118            compound 483 448     8      8       NA #FFFFFF
146     C00068 C00068            compound 378 825     8      8       NA #FFFFFF

> pv.out
$plot.data.gene
    kegg.names  labels      all.mapped type    x   y width height      DCIS_1      DCIS_2
16        4329 ALDH6A1            4329 gene  159 325    46     17  0.74686683  0.05287812
18          31   ACACA           31,32 gene  159 252    46     17 -0.48282912 -0.75715042
19       23417   MLYCD           23417 gene  159 204    46     17 -0.25145040 -0.37150885
26          18    ABAT              18 gene  276 370    46     17  2.78549061 -0.32723593
35       26275   HIBCH           26275 gene  377 327    46     17  0.76979701 -0.11439553
60        4329 ALDH6A1            4329 gene  951 409    46     17  0.74686683  0.05287812
63       84693    MCEE                 gene  807 494    46     17          NA          NA
64        5095    PCCA       5095,5096 gene  721 390    46     17  1.19029289 -0.30087442
66       55862  ECHDC1           55862 gene  831 390    46     17  0.16245690  0.09019571
68       79611   ACSS3           79611 gene 1051 229    46     17  0.07325135 -0.21532204
69       79611   ACSS3           79611 gene 1051 143    46     17  0.07325135 -0.21532204
71       55902   ACSS2                 gene 1001 229    46     17          NA          NA
72       55902   ACSS2                 gene 1001 143    46     17          NA          NA
77        3945    LDHB 3945,3948,92483 gene  743  64    46     17 -0.27453120  0.18750500
83        1892   ECHS1  1892,1962,3030 gene  468 317    46     17 -0.78788127  0.10146314
105         35   ACADS              35 gene  593 317    46     17 -0.37327672 -0.22446399
110       4594    MMUT            4594 gene  807 577    46     17  0.41314463  0.06333223
119       8801  SUCLG2  8801,8802,8803 gene  865 630    46     17  1.67147604  0.26659990
120       8801  SUCLG2  8801,8802,8803 gene  865 609    46     17  1.67147604  0.26659990
197       1738     DLD            1738 gene  769 241    46     17  1.25941863  0.04192270
198        593  BCKDHA         593,594 gene  806 209    46     17 -0.50170330  0.63064744
199        593  BCKDHA         593,594 gene  806 138    46     17 -0.50170330  0.63064744
200       1629     DBT            1629 gene  806 275    46     17 -0.13898650  0.01034633
214         51   ACOX1         51,8310 gene  593 338    46     17  0.56050833  0.13422425
         DCIS_3      DCIS_4      DCIS_5       DCIS_6 DCIS_1.col DCIS_2.col DCIS_3.col DCIS_4.col
16  -0.81369145 -0.27709498 -0.18071111  0.122670619    #EF3030    #BEBEBE    #00FF00    #8FCE8F
18  -0.63119257 -1.69649387 -0.81430094 -0.131227383    #5FDF5F    #30EF30    #30EF30    #00FF00
19  -0.21974935 -0.08768293 -0.36251055 -0.044751017    #8FCE8F    #8FCE8F    #8FCE8F    #BEBEBE
26  -0.63635871 -0.01074421  1.18053902 -0.340124650    #FF0000    #8FCE8F    #30EF30    #BEBEBE
35  -0.52919767 -0.25858711  0.13965970 -0.334158512    #EF3030    #BEBEBE    #5FDF5F    #8FCE8F
60  -0.81369145 -0.27709498 -0.18071111  0.122670619    #EF3030    #BEBEBE    #00FF00    #8FCE8F
63           NA          NA          NA           NA    #FFFFFF    #FFFFFF    #FFFFFF    #FFFFFF
64  -0.47811945  0.21841228  0.59842717  0.127384456    #FF0000    #8FCE8F    #5FDF5F    #CE8F8F
66   0.19608061  0.02913517  0.23258972  0.191704083    #BEBEBE    #BEBEBE    #BEBEBE    #BEBEBE
68   0.19192858  0.07799171  0.47373393  0.205822641    #BEBEBE    #8FCE8F    #BEBEBE    #BEBEBE
69   0.19192858  0.07799171  0.47373393  0.205822641    #BEBEBE    #8FCE8F    #BEBEBE    #BEBEBE
71           NA          NA          NA           NA    #FFFFFF    #FFFFFF    #FFFFFF    #FFFFFF
72           NA          NA          NA           NA    #FFFFFF    #FFFFFF    #FFFFFF    #FFFFFF
77  -0.14385615 -0.53114842  0.20440894 -0.557069022    #8FCE8F    #BEBEBE    #BEBEBE    #5FDF5F
83  -0.34086971 -0.26743804 -0.34815464 -0.006183515    #30EF30    #BEBEBE    #8FCE8F    #8FCE8F
105 -0.26460784 -0.82938399 -0.21631308  0.509809829    #8FCE8F    #8FCE8F    #8FCE8F    #00FF00
110 -0.36899420  0.02000592 -0.08163558  0.117707016    #DF5F5F    #BEBEBE    #8FCE8F    #BEBEBE
119 -0.91651583  0.14434606  0.34928782  0.023307894    #FF0000    #CE8F8F    #00FF00    #BEBEBE
120 -0.91651583  0.14434606  0.34928782  0.023307894    #FF0000    #CE8F8F    #00FF00    #BEBEBE
197 -0.29069324  0.04910358  0.30660849  0.205264819    #FF0000    #BEBEBE    #8FCE8F    #BEBEBE
198 -0.43042885 -0.17590157  0.10787799  0.182615247    #5FDF5F    #EF3030    #5FDF5F    #BEBEBE
199 -0.43042885 -0.17590157  0.10787799  0.182615247    #5FDF5F    #EF3030    #5FDF5F    #BEBEBE
200 -0.01534762 -0.04897299 -0.02063803  0.012819618    #BEBEBE    #BEBEBE    #BEBEBE    #BEBEBE
214 -0.52993241 -0.29012849 -0.11773491  0.070639856    #DF5F5F    #BEBEBE    #5FDF5F    #8FCE8F
    DCIS_5.col DCIS_6.col
16     #BEBEBE    #BEBEBE
18     #00FF00    #BEBEBE
19     #8FCE8F    #BEBEBE
26     #FF0000    #8FCE8F
35     #BEBEBE    #8FCE8F
60     #BEBEBE    #BEBEBE
63     #FFFFFF    #FFFFFF
64     #DF5F5F    #BEBEBE
66     #CE8F8F    #BEBEBE
68     #DF5F5F    #CE8F8F
69     #DF5F5F    #CE8F8F
71     #FFFFFF    #FFFFFF
72     #FFFFFF    #FFFFFF
77     #CE8F8F    #5FDF5F
83     #8FCE8F    #BEBEBE
105    #8FCE8F    #DF5F5F
110    #BEBEBE    #BEBEBE
119    #CE8F8F    #BEBEBE
120    #CE8F8F    #BEBEBE
197    #CE8F8F    #CE8F8F
198    #BEBEBE    #BEBEBE
199    #BEBEBE    #BEBEBE
200    #BEBEBE    #BEBEBE
214    #BEBEBE    #BEBEBE

$plot.data.cpd
    kegg.names labels all.mapped     type    x   y width height mol.data mol.col
30      C00222 C00222            compound  225 327     8      8       NA #FFFFFF
31      C00804 C00804            compound  225 451     8      8       NA #FFFFFF
32      C01013 C01013            compound  324 327     8      8       NA #FFFFFF
33      C00099 C00099            compound  325 388     8      8       NA #FFFFFF
36      C00083 C00083            compound  222 228     8      8       NA #FFFFFF
84      C00109 C00109            compound  806 105     8      8       NA #FFFFFF
85      C02876 C02876            compound  911 105     8      8       NA #FFFFFF
86      C00163 C00163            compound 1026 105     8      8       NA #FFFFFF
87      C00186 C00186            compound  523 157     8      8       NA #FFFFFF
88      C00827 C00827            compound  523 228     8      8       NA #FFFFFF
89      C00100 C00100            compound  806 324     8      8       NA #FFFFFF
90      C00683 C00683            compound  807 454     8      8       NA #FFFFFF
91      C02170 C02170            compound  897 454     8      8       NA #FFFFFF
92      C02335 C02335            compound  419 388     8      8       NA #FFFFFF
93      C05668 C05668            compound  418 327     8      8       NA #FFFFFF
94      C05984 C05984            compound  684  63     8      8       NA #FFFFFF
95      C05989 C05989            compound  418 228     8      8       NA #FFFFFF
96      C05983 C05983            compound 1026 179     8      8       NA #FFFFFF
97      C01213 C01213            compound  807 538     8      8       NA #FFFFFF
109     C04225 C04225            compound 1041 532     8      8       NA #FFFFFF
112     C00091 C00091            compound  807 619     8      8       NA #FFFFFF
113     C02225 C02225            compound 1041 433     8      8       NA #FFFFFF
122     C00042 C00042            compound  918 619     8      8       NA #FFFFFF
123     C04593 C04593            compound 1041 619     8      8       NA #FFFFFF
131     C00024 C00024            compound   90 227     8      8       NA #FFFFFF
157     C05979 C05979            compound  637 581     8      8       NA #FFFFFF
160     C00479 C00479            compound  637 468     8      8       NA #FFFFFF
169     C00111 C00111            compound  291 581     8      8       NA #FFFFFF
170     C00583 C00583            compound  499 468     8      8       NA #FFFFFF
171     C00424 C00424            compound  499 581     8      8       NA #FFFFFF
175     C05235 C05235            compound  396 468     8      8       NA #FFFFFF
176     C00546 C00546            compound  396 581     8      8       NA #FFFFFF
180     C06002 C06002            compound  950 454     8      8       NA #FFFFFF
193     C15973 C15973            compound  768 271     8      8       NA #FFFFFF
194     C15972 C15972            compound  768 206     8      8       NA #FFFFFF
195     C21017 C21017            compound  806 177     8      8       NA #FFFFFF
196     C00068 C00068            compound  844 160     8      8       NA #FFFFFF
201     C21018 C21018            compound  806 249     8      8       NA #FFFFFF
209     C21250 C21250            compound 1127 532     8      8       NA #FFFFFF
215     C00894 C00894            compound  523 327     8      8       NA #FFFFFF

galenseilis commented 1 year ago

Ah, I used the debugger incorrectly earlier. I will try using that again when I have had more sleep.

dfossl / RaderLabCode

Pathview is not assigning colours for the equation-by-equation timepoints #6