GenABEL-Project / ProbABEL

Tool for genome-wide association analysis of imputed genetic data.
7 stars 4 forks source link

Allow reading of gzipped files #12

Closed lckarssen closed 7 years ago

lckarssen commented 8 years ago

At least for genetic data the option of reading gzipped files would be great. With current imputed data sets this would save a lot of disk space.

@maarten-k has proof-of-principle code in his fork at https://github.com/maarten-k/ProbABEL

lckarssen commented 8 years ago

See also Issue #20.

lckarssen commented 7 years ago

Given that with PR #42 the read-gzipped-genotypes branch now allows users to used gzipped info, map, invsigma and dose/prob files, I think we have enough to close this issue. Reading gzipped phenotype data is not possible yet, but also not really required, IMHO. See my motivation in the comment on PR #42:

We could add reading of gzipped phenotype files as well, but since most people create these from R, I guess they won't bother to zip them. Moreover, the phenotype file is currently opened and closed several times in the process of extracting all phenotype info (determining nr of samples and covariates, finding out which lines have NAs, etc), so doing that for a zipped file would be more time consuming. Implementing a "read phenotype data once" strategy in the current code isn't trivial either.