Closed HenrikBengtsson closed 9 years ago
> library("aroma.seq")
> bams <- BamDataSet$byName("LG3-test", organism="HomoSapiens")
> ns <- lapply(bams, FUN=getTargetLengths)
> uns <- unique(ns)
> str(uns)
List of 3
$ : Named int [1:25] 247249719 135374737 134452384 132349534 114142980 106368585 100338915 88827254 78774742 76117153 ...
..- attr(*, "names")= chr [1:25] "1" "10" "11" "12" ...
$ : Named int [1:84] 249250621 243199373 198022430 191154276 180915260 171115067 159138663 146364022 141213431 135534747 ...
..- attr(*, "names")= chr [1:84] "1" "2" "3" "4" ...
$ : Named int [1:93] 249250621 135534747 135006516 40103 133851895 115169878 107349540 102531392 90354753 81195210 ...
..- attr(*, "names")= chr [1:93] "1" "10" "11" "11_gl000202_random" ...
> types <- match(ns, table=uns)
> types
[1] 1 1 1 1 1 2 2 3 3 3
> table(types)
types
1 2 3
5 2 3
So, there are n=5 of Type 1, n=2 of Type 2, and n=3 of Type 3.
> library("aroma.seq")
> bams <- BamDataSet$byName("LG3-test", organism="HomoSapiens")
> bams
BamDataSet:
Name: LG3-test
Tags:
Full name: LG3-test
Subpath: HomoSapiens
Number of files: 10
Names: A06929_C019VACXX_6, A06930_C019VACXX_7, A08297_C08J9ACXX_2, ..., Z00125 [10]
Path (to the first file): bamData/LG3-test/HomoSapiens
Total file size: 168.65 GB (181083190174 bytes)
RAM: 0.01MB
Number of unique target sequence sets: 3 [WARNING: unsafe to process if > 1]
Warning message:
In as.character.BamDataSet(x) :
Processing a BAM data set that consists of (3) BAM files which have been mapped (10)
different types of target sequences (different FASTA reference files) is unsafe and will
likely lead to problems that are hard to troubleshoot: bamData/LG3-test/HomoSapiens
Numbers were swapped, now it outputs:
In as.character.BamDataSet(x) :
Processing a BAM data set that consists of (10) BAM files which have been mapped (3)
different types of target sequences (different FASTA reference files) is unsafe and will
likely lead to problems that are hard to troubleshoot: bamData/LG3-test/HomoSapiens
When printing a
BamDataFileSet
object, it should report on number of unique BAM header target lengths finger prints in the set, e.g.In the above example, there are 3 different sets of target length vectors.
Among other things, this will help warn the user that the BAM files might have been aligned with different genome references.