Closed jdblischak closed 3 years ago
Thank you for the comments.
The two scripts random_vector_generation.R
and aggregation.R
aim at creating random vectors which are draw from the normal distribution with mean zero and covariance matrix V or V^2(here V denote the LD matrix), and the final outputs are saved per chromosome in directory Data/
, (e.g. Data/random_ld/chr1/s_1.txt
, ..., Data/random_ld/chr1/s_10000.txt
represent 10000 random vectors sampling from N(0, V^2)). The directory Temp/
tries to save the intermediate data, and the files under this directory will be removed in script aggregation.R
.
As you can see, we have created many random vectors which occupy pretty much storage space. In return, we can reduce memory usage in BiScan_null.R
.
@ghm17 Thanks for the additional context and explanation. However, I don't believe you addressed the changes I made in this Pull Request. I think that the script aggregation.R
is needlessly creating many empty directories that are never used (and not deleted afterwards). Do you disagree? If yes, could you please point me to the lines in your scripts where directories such as Data/random_ld/chr1/1/
are being used?
@jdblischak You are right. The subdirectories Data/random_ld/chr1/1/
are never used. Thank you very much for pointing out this.
The script
aggregation.R
creates many subdirectories per chromosome inData/
similar to howrandom_vector_generation.R
creates many subdirectories per chromosome inTemp/
. However, as far as I can tell, the directories created inData/
are never used.aggregation.R
writes its output files directly to the chromosome directory:https://github.com/ghm17/LOGODetect/blob/f6877934b1b89bd4e63ba2306ffbf11b36de3d44/Code/aggregation.R#L47
https://github.com/ghm17/LOGODetect/blob/f6877934b1b89bd4e63ba2306ffbf11b36de3d44/Code/aggregation.R#L60
And
BiScan_null.R
also ignores those empty subdirectories:https://github.com/ghm17/LOGODetect/blob/f6877934b1b89bd4e63ba2306ffbf11b36de3d44/Code/BiScan_null.R#L118-L122