GoekeLab / proActiv

Estimation of Promoter Activity from RNA-Seq data
https://goekelab.github.io/proActiv/
Other
48 stars 14 forks source link

Implement parallelization, BAM usage and wrapper #8

Closed jleechung closed 4 years ago

jleechung commented 4 years ago
jonathangoeke commented 4 years ago
* Add Windows parallelization in `getReducedExonRanges` (`annotation-data-helper.R`)
  Detects user OS and calls `BiocParallel` if OS is Windows. Scales well (runtime approximately halved with number of cores = 2)

* Add BAM file usage for `calculateJunctionReadCounts` (`junction-read-count.R`)
  Reads BAM file using `GenomicAlignments::readGAlignments` and `summarizeJunctions`. Extra argument 'genome' is required to infer strand of junctions for downstream analysis ( i.e., to prevent ambiguous junctions returned when overlapped with `intronRanges`). Removes BAM files immediately after processing and calls garbage collection to free up memory. Examples for `calculateJunctionReadCounts` and `calculatePromoterReadCounts` with BAM usage added

* Add Windows parallelization for `calculatePromoterReadCounts` (`junction-read-count.R`)
  Detects user OS and calls `BiocParallel` if OS is Windows. Scales for large number of input junction files

* Allow for unspecified file labels by creating labels if user does not provide labels argument in `calculatePromoterReadCounts` (`junction-read-count.R`)
  If labels unprovided, simply creates labels 's1', 's2', ...

* Wrapper function `proActiv.R`
  Takes in minimally arguments `promoterAnnotationData`, `junctionFilePaths` and `junctionType` (and `genome` if input is a BAM file), returning a `summarizedExperiment` object giving promoter counts and activity, with promoter-gene mapping (stored as row data) and gene expression (stored as column data)

Thanks @jleechung , really a lot of very helpful additions. I made some comments regarding some of the code, please have a look.