genepi / imputationserver

Michigan Imputation Server: A new web-based service for imputation that facilitates access to new reference panels and greatly improves user experience and productivity
https://imputationserver.sph.umich.edu/
GNU Affero General Public License v3.0
77 stars 41 forks source link

ChrX nonPAR region includes ambiguous samples #136

Open swvanderlaan opened 6 months ago

swvanderlaan commented 6 months ago

I am trying to impute Chr X, but I got the message below:

Input Validation
23 valid VCF file(s) found.

Samples: 780
Chromosomes: 1 10 11 12 13 14 15 16 17 18 19 2 20 21 22 X 3 4 5 6 7 8 9
SNPs: 408889
Chunks: 307
Datatype: unphased
Build: hg19
Reference Panel: apps@topmed-r3@1.0.0 (hg38)
Population: mixed
Phasing: eagle
Mode: imputation
Rsq filter: 0.001

Quality Control
Uploaded data is hg19 and reference is hg38.
Lift Over
Skip allele frequency check.
Calculating QC Statistics
Statistics:
Alternative allele frequency > 0.5 sites: 27
Reference Overlap: 98.21 %
Match: 272,248
Allele switch: 100,573
Strand flip: 125
Strand flip and allele switch: 99
A/T, C/G genotypes: 20,038
Filtered sites:
Filter flag set: 0
Invalid alleles: 0
Multiallelic sites: 0
Duplicated sites: 0
NonSNP sites: 0
Monomorphic sites: 8,327
Allele mismatch: 117
SNPs call rate < 90%: 0
Excluded sites in total: 8,668
Remaining sites in total: 392,859
See [snps-excluded.txt](https://imputation.biodatacatalyst.nhlbi.nih.gov/results/job-20240315-183502-310/statisticDir/snps-excluded.txt) for details
Typed only sites: 7,155
See [typed-only.txt](https://imputation.biodatacatalyst.nhlbi.nih.gov/results/job-20240315-183502-310/statisticDir/typed-only.txt) for details

Warning: 4 Chunk(s) excluded: < 20 SNPs (see [chunks-excluded.txt](https://imputation.biodatacatalyst.nhlbi.nih.gov/results/job-20240315-183502-310/statisticDir/chunks-excluded.txt) for details).
Warning: 1 Chunk(s) excluded: reference overlap < 50.0% (see [chunks-excluded.txt](https://imputation.biodatacatalyst.nhlbi.nih.gov/results/job-20240315-183502-310/statisticDir/chunks-excluded.txt) for details).
Remaining chunk(s): 303
Error: ChrX nonPAR region includes ambiguous samples (haploid and diploid positions). Imputation cannot be started! See [chrX-info.txt](https://imputation.biodatacatalyst.nhlbi.nih.gov/results/job-20240315-183502-310/statisticDir/chrX-info.txt)

How do I solve this particular issue? Do I simply remove those SNPs mentioned here chrX-info.txt? I did check whether these variants are present in my data, but they aren't. Or are they noted in the build38 chr:bp?