PeteHaitch / bsseq

Devel repository for bsseq
0 stars 0 forks source link

combineList and overlapping samples #3

Open PeteHaitch opened 8 years ago

PeteHaitch commented 8 years ago

combineList() is strict in its requirement of not allowing for overlapping sampleNames across the objects. However, this is sometimes perhaps too strict, e.g., combining CHG and CHH BSseq objects on the same samples. Currently have to rbind() and sort() but I suspect this may be expensive(r)/slow(er)?

PeteHaitch commented 8 years ago

On chr22 test data from flow-sorted-brain-wgbs:

> dim(chg)
[1] 4000888      45
> dim(chh)
[1] 11588137       45
> system.time(ch <- rbind(chg, chh))
user   system  elapsed
2126.715   64.998 2193.793
> dim(ch)
[1] 15589025       45
> pryr::object_size(ch)
125 MB
> system.time(sorted_ch <- sort(ch))
   user  system elapsed
  9.505   1.707  11.224
# Object is larger after sorting because row order index is stored in DelayedArray assay elements
> pryr::object_size(sorted_ch)
250 MB