Michigan Imputation Server: A new web-based service for imputation that facilitates access to new reference panels and greatly improves user experience and productivity
I'm attempting to replicate the Server's imputation with the 1000G Phase 3 GRCh38 Reference Panel. I started by confirming my local execution of the Eagle pre-phasing matches that of the Server by running 2 jobs on the Server:
Full pipeline (including pre-phasing) on unphased input genotypes
Imputation-only on genotypes pre-phased locally using Eagle following the example command in the documentation
These 2 jobs produced identical results indicating my local execution of Eagle is the same as on the Server. I then used the same pre-phased genotypes from Eagle to run Minimac4 (v1.0.2) locally following the example command in the documentation but cannot get these results to match those from the Server.
I sourced my reference haplotypes from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/release/20190312_biallelic_SNV_and_INDEL/ and, after removing the singletons, confirmed the variant-level contents match the results from the Server (no additional or missing variants) indicating they are almost certainly from the same source (although, possibly not the same samples? I'm using the full set of 2548 samples).
I generated the M3VCF files with parameter estimates using Minimac3 v2.0.1 with the default options:
I thought perhaps the Server may be using a genetic recombination map file but could not locate geneticMapFile.b38.map.txt.gz and tried the map file from Eagle instead:
These results matched those from the Server a bit better than using the M3VCF parameter estimates but still some discordance.
I think it would help me try to troubleshoot if you could please:
Confirm if the Server is using the full 2548-sample haplotype set from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/release/20190312_biallelic_SNV_and_INDEL)/ ?
Clarify how your M3VCF files were generated (e.g. did you use non-default values for --rounds or --states or a different version of Minimac3)?
Confirm the precise minimac command (i.e. is it using the --mapFile and if so, which one)?
I'm attempting to replicate the Server's imputation with the 1000G Phase 3 GRCh38 Reference Panel. I started by confirming my local execution of the Eagle pre-phasing matches that of the Server by running 2 jobs on the Server:
These 2 jobs produced identical results indicating my local execution of Eagle is the same as on the Server. I then used the same pre-phased genotypes from Eagle to run Minimac4 (v1.0.2) locally following the example command in the documentation but cannot get these results to match those from the Server.
I sourced my reference haplotypes from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/release/20190312_biallelic_SNV_and_INDEL/ and, after removing the singletons, confirmed the variant-level contents match the results from the Server (no additional or missing variants) indicating they are almost certainly from the same source (although, possibly not the same samples? I'm using the full set of 2548 samples).
I generated the M3VCF files with parameter estimates using Minimac3 v2.0.1 with the default options:
I thought perhaps the Server may be using a genetic recombination map file but could not locate
geneticMapFile.b38.map.txt.gz
and tried the map file from Eagle instead:These results matched those from the Server a bit better than using the M3VCF parameter estimates but still some discordance.
I think it would help me try to troubleshoot if you could please:
--rounds
or--states
or a different version of Minimac3)?--mapFile
and if so, which one)?