Closed lckarssen closed 7 years ago
See also Issue #20.
Given that with PR #42 the read-gzipped-genotypes
branch now allows users to used gzipped info, map, invsigma and dose/prob files, I think we have enough to close this issue.
Reading gzipped phenotype data is not possible yet, but also not really required, IMHO. See my motivation in the comment on PR #42:
We could add reading of gzipped phenotype files as well, but since most people create these from R, I guess they won't bother to zip them. Moreover, the phenotype file is currently opened and closed several times in the process of extracting all phenotype info (determining nr of samples and covariates, finding out which lines have NAs, etc), so doing that for a zipped file would be more time consuming. Implementing a "read phenotype data once" strategy in the current code isn't trivial either.
At least for genetic data the option of reading gzipped files would be great. With current imputed data sets this would save a lot of disk space.
@maarten-k has proof-of-principle code in his fork at https://github.com/maarten-k/ProbABEL