AntonioDeFalco / SCEVAN

R package that automatically classifies the cells in the scRNA data by segregating non-malignant cells of tumor microenviroment from the malignant cells. It also infers the copy number profile of malignant cells, identifies subclonal structures and analyses the specific and shared alterations of each subpopulation.
https://www.nature.com/articles/s41467-023-36790-9
GNU General Public License v3.0
87 stars 25 forks source link

Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 6435, 5148 #115

Open Ahmed-Ghobashi opened 2 weeks ago

Ahmed-Ghobashi commented 2 weeks ago

Thank you for developing this software. I am trying to run on my samples, but I keep getting this error Any help will be appreciated

cvn_SCC <- pipelineCNA(SCC@assays$SCT@counts, sample = "Melanoma", SUBCLONES = F, plotTree = FALSE )

[1] " raw data - genes: 23137 cells: 41453" [1] "1) Filter: cells > 200 genes" [1] "2) Filter: genes > 10% of cells" [1] "7224 genes past filtering" [1] "3) Annotations gene coordinates" [1] "found 30 confident non malignant cells" [1] "6842 genes annotated" [1] "4) Filter: genes involved in the cell cycle" [1] "6435 genes past filtering " [1] "5) Filter: cells > 5genes per chromosome " [1] "6) Log Freeman Turkey transformation" [1] "A total of 41450 cells, 6435 genes after preprocessing" [1] "7) Measuring baselines (confident normal cells)" [1] "8) Smoothing data" [1] "9) Segmentation (VegaMC)" Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 6435, 5148 In addition: Warning messages: 1: In asMethod(object) : sparse->dense coercion: allocating vector of size 7.1 GiB 2: In asMethod(object) : sparse->dense coercion: allocating vector of size 7.1 GiB 3: In asMethod(object) : sparse->dense coercion: allocating vector of size 2.1 GiB 4: In parallel::mclapply(1:ncol(count_mtx_relat), nonLinSmooth, mc.cores = par_cores) : scheduled cores 6, 8, 10, 17 did not deliver results, all values of the jobs will be affected 5: In matrix(unlist(test.mc), ncol = ncol(count_mtx_relat), byrow = FALSE) : data length [213378165] is not a sub-multiple or multiple of the number of rows [5148]

naila53 commented 2 weeks ago

I'm also facing the same issue: results <- SCEVAN::multiSampleComparisonClonalCN(raws_mats, analysisName = "all", organism = "mouse" , par_cores = 10)

[1] " raw data - genes: 24516 cells: 3246" [1] "1) Filter: cells > 200 genes" [1] "2) Filter: genes > 10% of cells" [1] "9783 genes past filtering" [1] "3) Annotations gene coordinates" [1] "found 30 confident non malignant cells" [1] "9026 genes annotated" [1] "4) Filter: genes involved in the cell cycle" [1] "8588 genes past filtering " [1] "5) Filter: cells > 5genes per chromosome " [1] "6) Log Freeman Turkey transformation" [1] "A total of 3210 cells, 8588 genes after preprocessing" [1] "7) Measuring baselines (confident normal cells)" [1] "8) Smoothing data" Warning: scheduled core 1 did not deliver a result, all values of the job will be affectedWarning: data length [24810732] is not a sub-multiple or multiple of the number of rows [7730][1] "9) Segmentation (VegaMC)" Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 8588, 7730

jakubotreba commented 1 week ago

I also am getting this error!

`[1] " raw data - genes: 23686 cells: 7930" [1] "1) Filter: cells > 200 genes" [1] "2) Filter: genes > 10% of cells" [1] "12946 genes past filtering" [1] "3) Annotations gene coordinates"

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

filter, lag

The following objects are masked from ‘package:base’:

intersect, setdiff, setequal, union

Warning message: “package ‘yaGST’ was built under R version 4.3.1” Loading required package: doParallel

Loading required package: foreach

Loading required package: iterators

Loading required package: parallel

[1] "found 30 confident non malignant cells" [1] "11633 genes annotated" [1] "4) Filter: genes involved in the cell cycle" [1] "11068 genes past filtering " [1] "5) Filter: cells > 5genes per chromosome " [1] "6) Log Freeman Turkey transformation" [1] "A total of 7930 cells, 11068 genes after preprocessing" [1] "7) Measuring baselines (confident normal cells)" [1] "8) Smoothing data" Warning message in parallel::mclapply(1:ncol(count_mtx_relat), nonLinSmooth, mc.cores = par_cores): “scheduled cores 6, 16 did not deliver results, all values of the jobs will be affected” Warning message in matrix(unlist(test.mc), ncol = ncol(count_mtx_relat), byrow = FALSE): “data length [78992316] is not a sub-multiple or multiple of the number of rows [9962]” [1] "9) Segmentation (VegaMC)" Error in data.frame(..., check.names = FALSE): arguments imply differing number of rows: 11068, 9962 Traceback:

  1. SCEVAN::pipelineCNA(mtx, SUBCLONES = FALSE)
  2. classifyTumorCells(res_proc$count_mtx_norm, res_proc$count_mtx_annot, . sample, par_cores = par_cores, ground_truth = NULL, norm_cell_names = norm_cell, . SEGMENTATION_CLASS = TRUE, SMOOTH = TRUE, beta_vega = beta_vega)
  3. cbind(annot_mtx[, c(4, 1, 3)], count_mtx_relat)
  4. cbind(deparse.level, ...)
  5. data.frame(..., check.names = FALSE)
  6. stop(gettextf("arguments imply differing number of rows: %s", . paste(unique(nrows), collapse = ", ")), domain = NA)`
AntonioDeFalco commented 1 week ago

Hello everyone, Since I can't replicate the error, I wanted to ask for your help in understanding the problem:

Thanks @jakubotreba @naila53 @Ahmed-Ghobashi . Regards

Ahmed-Ghobashi commented 1 week ago

@AntonioDeFalco Thank you for your reply.

1.I did not reinstall SCEVAN. I will do it this week 2.I use Gene symbol

  1. I have around 12 samples. only one sample works. the package run without error on the sample vignette
  2. they are human samples

I will reinstall the package and will keep you posted. Thank you so much for your help

naila53 commented 1 week ago

Hi @AntonioDeFalco , Did you reinstall SCEVAN at the last commit? no i didn't not Are the genes in the input matrix Gene Symbol or Ensembl ID? gene symbol Do you get the error with all samples or only with some in particular? Can you run the sample vignettes without errors? i get the error when i run multiple samples vignette, however this worked for me i had to run it individually per sample, i have a total of 10 count matrices that sum up to 40k cells:

for(i in names(raws_mats)){ scevans[[i]]=SCEVAN::pipelineCNA(raws_mats[[i]], sample = i, par_cores = 6, SUBCLONES = TRUE, plotTree = TRUE, organism = "mouse") }

Both human and mouse organisms? my data is mouse cells

Ahmed-Ghobashi commented 1 week ago

to give you updates. After reinstalling the package, it worked but not on all the samples. Four samples were integrated using Seurat and processed the same way. Then I split them based on orgi.ident and 3 of my samples worked fine and one sample gave me the same error.

Prakrithi-P commented 6 days ago

Hi ! Has anyone figured this out? The single sample pipeline works well for most samples I tried but for a few, I get this error.

Error in data.frame(..., check.names = FALSE): arguments imply differing number of rows: 7154, 6080 Traceback:

  1. pipelineCNA(P4_SCC, sample = "P4_SCC")
  2. classifyTumorCells(res_proc$count_mtx_norm, res_proc$count_mtx_annot, . sample, par_cores = par_cores, ground_truth = NULL, norm_cell_names = norm_cell, . SEGMENTATION_CLASS = TRUE, SMOOTH = TRUE, beta_vega = beta_vega)
  3. cbind(annot_mtx[, c(4, 1, 3)], count_mtx_relat)
  4. cbind(deparse.level, ...)
  5. data.frame(..., check.names = FALSE)
  6. stop(gettextf("arguments imply differing number of rows: %s", . paste(unique(nrows), collapse = ", ")), domain = NA)