Closed smgogarten closed 3 years ago
Merging #71 (d6cae8d) into master (cf70704) will increase coverage by
2.99%
. The diff coverage is95.02%
.:exclamation: Current head d6cae8d differs from pull request most recent head 8586975. Consider uploading reports for the commit 8586975 to get more accurate results
@@ Coverage Diff @@
## master #71 +/- ##
==========================================
+ Coverage 81.08% 84.07% +2.99%
==========================================
Files 38 38
Lines 2823 2851 +28
==========================================
+ Hits 2289 2397 +108
+ Misses 534 454 -80
Impacted Files | Coverage Δ | |
---|---|---|
R/nullModelFastScore.R | 84.70% <85.71%> (+83.51%) |
:arrow_up: |
R/pcrelate.R | 89.91% <88.88%> (+0.53%) |
:arrow_up: |
R/assocTestSingle.R | 93.10% <94.33%> (+2.09%) |
:arrow_up: |
R/admixMap.R | 89.62% <96.00%> (+0.96%) |
:arrow_up: |
R/assocTestAggregate.R | 96.39% <97.40%> (+0.18%) |
:arrow_up: |
R/utils.R | 97.70% <100.00%> (-0.49%) |
:arrow_down: |
R/util_methods.R | 88.15% <0.00%> (+0.48%) |
:arrow_up: |
R/testGeno.R | 60.12% <0.00%> (+0.63%) |
:arrow_up: |
R/nullModelTestPrep.R | 85.36% <0.00%> (+17.07%) |
:arrow_up: |
... and 1 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update cf70704...8586975. Read the comment docs.
@smgogarten - as far as I can tell, this all looks good. I left a couple minor comments/questions.
Going back to one of my comments -- there's actually a fair amount of redundant code between assocTestSingle
and assocTestAggregate
, as well as .testGenoBlockSingle
and .testGenoBlockAggregate
. I'm not sure if it's worthwhile to try to turn those repeated bits into sub-functions themselves, or if that's a waste of time and effort.
@smgogarten - as far as I can tell, this all seems good to go!
Loops in
pcrelate
,assocTestSingle
,assocTestAggregate
, andadmixMap
have been refactored to use the BiocParallel package. With BiocParallel, the user can define a parallel backend and either register it in the current R session or pass it as an argument to the GENESIS function, and that backend will be used to control parallelization. The default is Multicore, which uses one less than the available number of CPUs on the machine (of those, one is the head job so the actual number of workers isN-2
).We use the
bpiterate
function which takes two arguments: an iterator that supplies a new chunk of data every time it is called, and a function to run on that data (which can take additional arguments).SeqVarIterator
objects are used to construct the iterators, which are defined separately within each method to return genotype data in the appropriate format. The functions called bybpiterate
are shared across all methods for a given function (e.g., the.testGenoBlockSingle
function is called by bothSeqVarIterator
andGenotypeIterator
methods forassocTestSingle
). This reduces the duplication of code compared to previous versions of GENESIS.bpiterate
has the option of either returning a list containing all elements, or using a "reduce" function to combine elements. The association functions use list return and combine them withrbindlist
as previously.pcrelate
previously used a reduce function as implemented in theforeach
package, and that has been replaced by thebpiterate
call.bpiterate
returns errors as list elements, which is undesirable since all error messages then result in a failure ofrbindlist
to bind errors with successful output. This was addressed with the.stopOnError
function that checks for errors and then prints them along with the data chunk in which they occurred.Performance testing was done on a dataset with ~45k samples. I compared three scenarios: 1) previous GENESIS code, 2) BiocParallel with serial execution, 3) BiocParallel with multicore execution on a 24-core machine. In summary, the memory usage and run time is very similar between BiocParallel serial and the original code. BiocParallel multicore substantially reduces run time at the expense of additional memory.