adimitromanolakis / sim1000G

Simulation of rare and common variants based on 1000 genomes data
17 stars 1 forks source link

Save Genotypes of unrelated Individuals #5

Closed TeaByrd closed 3 years ago

TeaByrd commented 3 years ago

Hello guys,

I'm asking how do I actually save the Genotypes of my simulation for unrelated Individuals? I don't know what to put into the 'fam' paramter of the writePED function. My code looks like this:

library(sim1000G)

# Read the example file included in sim1000G

vcf_file = file.path("MasterThesis/Aetionomy/AETIO_CS_postimpvcf_subset.vcf")
vcf = readVCF(vcf_file, min_maf = 0.02 , max_maf = NA)

vcf = readVCF( vcf_file, maxNumberOfVariants = 600 , min_maf = 0.01, max_maf = 1)

startSimulation(vcf, totalNumberOfIndividuals = 1000)
ids = generateUnrelatedIndividuals(200)

writePED(vcf, fam,filename = "Desktop/out_test_1000G") #what is my fam parameter in this case?
adimitromanolakis commented 3 years ago

Hi,

the fam parameter is a 6 column data matrix with the following columns: FamilyID, IndividualID, FatherID, MotherID, Gender, Status, Genotype Index for sim1000G. The first 6 columns correspond to the information that will be written in the PED file.

For your case you can use:

fam = data.frame( fid=ids, id=ids, dad=0,mom=0, gender=1, status=1, gtindex=ids )

Alternatively you can retrieve the genotypes with the function: retrieveGenotypes( ids[i] )

Best, Apostolos

On Wed, 28 Oct 2020 at 14:21, Thomas Lordick notifications@github.com wrote:

Hello guys,

I'm asking how do I actually save the Genotypes of my simulation for unrelated Individuals? I don't know what to put into the 'fam' paramter of the writePED function. My code looks like this:

library(sim1000G)

Read the example file included in sim1000G

vcf_file = file.path("MasterThesis/Aetionomy/AETIO_CS_postimpvcf_subset.vcf") vcf = readVCF(vcf_file, min_maf = 0.02 , max_maf = NA)

vcf = readVCF( vcf_file, maxNumberOfVariants = 600 , min_maf = 0.01, max_maf = 1)

startSimulation(vcf, totalNumberOfIndividuals = 1000) ids = generateUnrelatedIndividuals(200)

writePED(vcf, fam,filename = "Desktop/out_test_1000G") #what is my fam parameter in this case?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/adimitromanolakis/sim1000G/issues/5, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEGJVYYGSYKAFJMACTG34QTSNAEFBANCNFSM4TCH4K4A .

TeaByrd commented 3 years ago

Thanks, Apolostos. The problem, I'm facing now is the following error when simulating:

[#.......] Reading VCF file..
Parsed with column specification:
cols(
  .default = col_character(),
  `#CHROM` = col_double(),
  POS = col_double()
)
See spec(...) for full column specifications.
[##......] Chromosome:   1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22  Mbp:  0.041054  Region Size:  247529.8 kb  Num of individuals: 106 
[##......] Before filtering  Num of variants: 6815 Num of individuals: 106 
[###.....] After filtering  Num of variants: 6779 Num of individuals: 106 
[#####...] Creating SIM object
Warning: Some variants are not polymorphic. (n= 3179 )
[#####...] Haplodata object created
Error in startSimulation(vcf, totalNumberOfIndividuals = 1000) : Error: Genetic map does not match the region being simulated

Do I have to simulate chromosome-wise? Splitting the vcf file per chromosome and then simulate or what's the cause of the error in particular?

adimitromanolakis commented 3 years ago

Hi Thomas,

yes, it is only possible to simulate single chromosomes at the moment. We have not incorporated methods to correctly simulate whole genome data.

Best, Apostolos

On Fri, 30 Oct 2020 at 13:31, Thomas Lordick notifications@github.com wrote:

Thanks, Apolostos. The problem, I'm facing now is the following error when simulating:

[#.......] Reading VCF file.. Parsed with column specification: cols( .default = col_character(), #CHROM = col_double(), POS = col_double() ) See spec(...) for full column specifications. [##......] Chromosome: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Mbp: 0.041054 Region Size: 247529.8 kb Num of individuals: 106 [##......] Before filtering Num of variants: 6815 Num of individuals: 106 [###.....] After filtering Num of variants: 6779 Num of individuals: 106 [#####...] Creating SIM object Warning: Some variants are not polymorphic. (n= 3179 ) [#####...] Haplodata object created Error in startSimulation(vcf, totalNumberOfIndividuals = 1000) : Error: Genetic map does not match the region being simulated

Do I have to simulate chromosome-wise? Splitting the vcf file per chromosome and then simulate or what's the cause of the error in particular?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/adimitromanolakis/sim1000G/issues/5#issuecomment-719500152, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEGJVY62T35O3PC2RGYB5Y3SNKPX7ANCNFSM4TCH4K4A .

raqueldias commented 3 years ago

Hi! On the same topic of saving the simulated genotypes of unrelated individuals: is there a way to save the genotypes in VCF format instead of PED?

adimitromanolakis commented 3 years ago

Hi Raquel,

no we haven't implemented such functionality yet in sim1000G but it would be a useful addition. You can try using plink to convert the ped file into a vcf.

Best, Apostolos

On Fri, 26 Feb 2021 at 02:38, Raquel Dias @.***> wrote:

Hi! On the same topic of saving the simulated genotypes of unrelated individuals: is there a way to save the genotypes in VCF format instead of PED?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/adimitromanolakis/sim1000G/issues/5#issuecomment-786327024, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEGJVY47EGUYTPNNWD6I7NTTA3UORANCNFSM4TCH4K4A .