adimitromanolakis / sim1000G

Simulation of rare and common variants based on 1000 genomes data
17 stars 1 forks source link

error :mismatch between chromosomes in genetic map and vcf #16

Open eliz641995 opened 6 months ago

eliz641995 commented 6 months ago

I'm currently trying to simulate the genotypes of 3 different genes from different chromosomes ( for unrelated individuals) I downloaded the VCF like this :

tabix -h http://hgdownload.cse.ucsc.edu/gbdb/hg19/1000Genomes/phase3/ALL.chr10.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz 10:106400859-107024993 > SORC3.vcf

then I run this function get_genotype=function(vcf_path){ vcf = readVCF( vcf_path ,min_maf = NA, max_maf = NA,maxNumberOfVariants = 2000) startSimulation(vcf, totalNumberOfIndividuals = 3010) ids = generateUnrelatedIndividuals(3000) genotype = retrieveGenotypes(ids) return(genotype)

} vcf_sorcs$Gene.refGene="SORCS3" it worked well but when after it I want to continue simulating for the next gene

tabix -h http://hgdownload.cse.ucsc.edu/gbdb/hg19/1000Genomes/phase3/ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz 171861623-72748417 > NEGR1.vcf

I encounter thar error genotype_nerg=get_genotype("NEGR1.vcf") [#.......] Reading VCF file.. Rows: 24109 Columns: 2513 ── Column specification ───────────────────────────────────────────────────────────────────────────────── Delimiter: "\t" chr (2510): ID, REF, ALT, FILTER, INFO, FORMAT, HG00096, HG00097, HG00099, HG00100, HG00101, HG00102,... dbl (3): #CHROM, POS, QUAL

ℹ Use spec() to retrieve the full column specification for this data. ℹ Specify the column types or set show_col_types = FALSE to quiet this message. [##......] Chromosome: 1 Mbp: 71.8617 Region Size: 886.685 kb Num of individuals: 2504 [##......] Before filtering Num of variants: 24009 Num of individuals: 2504 [###.....] After filtering Num of variants: 2000 Num of individuals: 2504 [#####...] Creating SIM object Warning: Some variants are not polymorphic. (n= 1047 1120 1303 ) [#####...] Haplodata object created 10 1 Error in startSimulation(vcf, totalNumberOfIndividuals = 3010) : Error: mismatch between chromosomes in genetic map and vcf

If I delete all the files in my R environment and restart it then it works but simulating one gene after another fails .

would be happy for your help in understanding what is happening and how can i fix it