barkasn / scAlleleCount

4 stars 1 forks source link

Error in if (els$i < 1 || els$i > nr) stop("readMM(): row\t values 'i' are not in 1:nr": missing value where TRUE/FALSE needed #1

Open romanhaa opened 6 years ago

romanhaa commented 6 years ago

Hello,

I'm trying to use your tool in combination with the HoneyBADGER tool that was recently published: https://jef.works/HoneyBADGER/Preparing_Data.html

The BAM file has been processed similar to 10x Genomics files, contains only uniquely mapped and annotated reads (single-end). The cellular barcodes are saved in the header, separated from the read name through an underscore, followed by the UMI, again separated through an underscore.

However, at the end of running the getFastCellAlleleCount function I receive an error as posted below.

sample.results      <- getFastCellAlleleCount(snps=snps.scAlleleCount.merged, bamFile=sample.bamFile, cellBarcodes=sample.cellBarcodes, verbose=TRUE, scAlleleCountExec='scAlleleCount-master/scAlleleCount.py')
# reading snps file... done
# building positions index... done
# reading barcodes file... reading barcodes file... done
# building barcodes index... done
# perfoming pileup...Warning: The index file is older than the data file
.................................................................done
saving matrices...done
Error in if (els$i < 1 || els$i > nr) stop("readMM(): row\t values 'i' are not in 1:nr":
missing value where TRUE/FALSE needed

Do you have any idea what this means and how I can avoid it?

Thanks a lot!

Session info:

R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux buster/sid

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
[1] Biostrings_2.48.0    XVector_0.20.0       GenomicRanges_1.32.3
[4] GenomeInfoDb_1.16.0  IRanges_2.14.10      S4Vectors_0.18.3
[7] BiocGenerics_0.26.0

loaded via a namespace (and not attached):
[1] lattice_0.20-35        bitops_1.0-6           grid_3.5.1
[4] zlibbioc_1.26.0        Matrix_1.2-14          tools_3.5.1
[7] RCurl_1.95-4.10        compiler_3.5.1         GenomeInfoDbData_1.1.0
barkasn commented 6 years ago

Thank you for submitting the issue. Could you please try the following:

1) Run the python script directly. The R function is a wrapper around the python script 2) Re-make the bam file index. You are getting a warning "Warning: The index file is older than the data file" that indicates that the index is out of date and could be the source of error. 3) If you are still experiencing the same problem get a small input file that you can reproduce the issue?

BW,

Nikolas