Open rajwanir opened 1 month ago
Marker rs112096861 does indeed belong to XY
as it is part of the XTR region which is shared between chromosome X and chromosome Y. However, the other 4 markers do seem to be mislocalized in the Illumina manifest files. The array intensities indicate that they are not tagging a polymorphic variant anyway, but either the localizations in the manifest file or the source sequences are completely wrong
The pseudo-autosomal regions is often annotated in the Illumina's CSV manifest as
XY
chrom. gtc2vcf probably recode them as chromX
in the output vcf:https://github.com/freeseek/gtc2vcf/blob/224e7c60b81188342a029ec89f3777537fa7b4f6/gtc2vcf.h#L138-L143
However, it may be strongly encouraged to realign to the reference genome as emphasized in the documentation. If Illumina's CSV manifest is used directly, the output accuracy relies on the Illumina's CSV manifest. Sometimes this PAR may not be correctly annotated in the CSV manifest and the SNPs may actually be onto unique regions on the Y chrom.
For example, in the GSA chip ~80+ SNPs are annotated as
XY
which actually are actually located on unique regions on theY
chrom.A few snps from the input CSV manifest:
In the output vcf records:
However, all these SNPS appeear outside the PAR region ((https://useast.ensembl.org/info/genome/genebuild/human_PARS.html) and onto unique region of the Y chrom (e.g. https://ncbi.nlm.nih.gov/snp/rs10465468 ). If the realignment workflow is chosen, the SourceSeq uniquely maps to Y chrom and corrects it. An additional note on this is that if the SNPS indeed lie within PAR region, under the realignment workflow it will still be annotated as
X
chrom since the PAR regions is hardmasked onY
chrom.Thought to write here for the interest of any other user who runs into this observation.
.