Currently ProbABEL (v0.4.3) checks whether the IDs in the phenotype and genotype files are in the same order (while reading in the data). We could improve on this be reading in the phenotype data completely first and then reorder the phenotype data according to the genotype data.
My guess is that this is not too expensive in terms of computation time when compared to running several million regressions. Of course, as genotype data may be split in chunks and many chunks run parallel (e.g. using a queue system), this cost increases, as each separate job has to do the order check, and if the data is out-of-order, the reordering.
This is a 'liftover' of feature request #5677 from the R-forge tracker.
This request is based on the forum post by user Siru at http://forum.genabel.org/viewtopic.php?f=10&t=876.
Currently ProbABEL (v0.4.3) checks whether the IDs in the phenotype and genotype files are in the same order (while reading in the data). We could improve on this be reading in the phenotype data completely first and then reorder the phenotype data according to the genotype data.
My guess is that this is not too expensive in terms of computation time when compared to running several million regressions. Of course, as genotype data may be split in chunks and many chunks run parallel (e.g. using a queue system), this cost increases, as each separate job has to do the order check, and if the data is out-of-order, the reordering.