RemiMaglione / r-scripts

all my dirty r-scripts
GNU General Public License v3.0
4 stars 3 forks source link

kaiju2anvio.R Execution halted when rows have some missing column values. #2

Open Mayurk619 opened 2 months ago

Mayurk619 commented 2 months ago

When I run his command Rscript kaiju2anvio.R gene_calls_nr.names gene_calls_nr-fixed.names I am getting the following error in terminal. I'm not able to understand the error. Kindly help.

Loading required package: parallel Error in cbind(as.matrix(kaiju.names[, 2]), mat) : number of rows of matrices must match (see arg 2) Calls: kaiju2mat -> cbind In addition: Warning message: In matrix(unlist(mclapply(1:nrow(kaiju.names), FUN = function(i) { : data length [2193721] is not a sub-multiple or multiple of the number of rows [313389] Execution halted

Mayurk619 commented 2 months ago

I solved it by changing the script.

#!/usr/bin/env Rscript
args = commandArgs(trailingOnly=TRUE)
parallel=TRUE

# Input control
if (length(args) == 0) {
  stop("At least one argument must be supplied (input kaiju file).\n", call.=FALSE)
} else if (length(args) == 1) {
  # default output file
  args[2] = "kaiju2Anvio-fixed.names"
} else if (length(args) == 3) {
  parallel = args[3]
}

# Parallel package install control
if (!require("parallel")) install.packages("parallel")

# Function
kaiju2mat <- function(kaiju.names, parallel) {
  require(parallel)
  if (isTRUE(parallel)) {
    cores <- detectCores() - 1
  } else {
    cores <- parallel
  }

  mat <- matrix(unlist(mclapply(1:nrow(kaiju.names), FUN = function(i) {
    if (kaiju.names[i, 8] != "") {
      x.tmp <- unlist(strsplit(as.character(kaiju.names[i, 8]), split = ";"))
      length(x.tmp) <- 7
      return(x.tmp)
    } else {
      x.tmp <- rep(NA, 7)
      return(x.tmp)
    }
  }, mc.cores = cores)), ncol = 7, byrow = TRUE)

  if (nrow(mat) != nrow(kaiju.names)) {
    stop(paste("Mismatch in the number of rows between 'mat' (", nrow(mat), ") and 'kaiju.names' (", nrow(kaiju.names), ").\n", sep = ""))
  }

  mat <- cbind(as.matrix(kaiju.names[, 2]), mat)
  colnames(mat) <- c("gene_callers_id", "t_domain", "t_phylum", "t_class", "t_order", "t_family", "t_genus", "t_species")
  return(mat)
}

# __MAIN__
kaiju.names <- read.table(file = args[1], sep = "\t", fill = TRUE, row.names = NULL, header = FALSE, quote = "")
print("kaiju.names:")
print(head(kaiju.names))

# Check if the expected number of columns is present
if (ncol(kaiju.names) < 8) {
  stop("Input file does not have the expected number of columns.\n", call.=FALSE)
}

kaijumat <- kaiju2mat(kaiju.names = kaiju.names, parallel = parallel)
print("kaijumat:")
print(head(kaijumat))

# Write the output file with tab delimiters
write.table(kaijumat, file = args[2], quote = FALSE, col.names = TRUE, row.names = FALSE, sep = "\t")