Closed ArtPoon closed 4 years ago
Timing with 100 genomes sampled from UK, original code:
(pangolin) art@orolo:~/work/sc2-clustering/data$ pangolin --outfile uk100.out uk100.fa
...
reading in data 07/27/2020, 11:50:22
removing unnecessary columns 07/27/2020, 11:50:26
loading model 07/27/2020, 11:56:07
generating predictions 07/27/2020, 11:56:08
With modified version:
(pangolin) art@orolo:~/work/sc2-clustering/data$ pangolin --outfile uk100-2.out uk100.fa
...
reading in data 07/27/2020, 11:46:04
removing unnecessary columns 07/27/2020, 11:46:08
constructing data frame07/27/2020, 11:46:09
loading model 07/27/2020, 11:46:25
generating predictions 07/27/2020, 11:46:26
Outputs are identical:
(pangolin) art@orolo:~/work/sc2-clustering/data$ diff uk100.out uk100-2.out
Filing pull request
These lines:
are unnecessarily iterating over every position of each genome - it should be faster to iterate over
indiciesToKeep
only: