WGLab / GenGen

A set of software tools to facilitate GWAS analysis
http://gengen.openbioinformatics.org
Other
20 stars 10 forks source link

replacing missing data convert_bim_allele.pl #11

Open cristian2420 opened 2 years ago

cristian2420 commented 2 years ago

Hi WGlab,

I'm using convert_bim_allele.pl to fill the missing alleles in my data with the next command. convert_bim_allele.pl --replacezero --intype top genotypes.clean.bim OMNI_chip.snptable --outfile output_fill.bim But I'm getting this error:

NOTICE: The default --outtype of 'dbsnp' is assumed as output format NOTICE: Reading SNP Table file OMNI_chip.snptable ... Done with 1705969 SNPs NOITCE: 23612 insertion/deletion polymorphism are annotated in OMNI_chip (examples:1:100316615-CAG-C,1:100336041-TAGAC-T,1:100379098-GT-G) NOTICE: The new bim file will be written to output_fill.bim ... FATAL ERROR: the minor allele for SNP JHU_1.17537 is C but major allele is a zero allele in BIM file genotypes.clean.bim

It stops at the first allele. I don't know if changing the order of my alleles will help of if I'm missing something. My inputs look like this: BIM file:

1 JHU_1.17537 0 17538 C 0 1 JHU_1.54675 0 54676 T 0 1 JHU_1.56018 0 56019 T 0 1 JHU_1.61461 0 61462 A T 1 JHU_1.66161 0 66162 A 0 1 JHU_1.84138 0 84139 A 0 1 JHU_1.88337 0 88338 A G 1 JHU_1.91535 0 91536 G T 1 JHU_1.91580 0 91581 G A

SNPTable:

Name SNP ILMN Strand Customer Strand 1:10002775-GA [A/G] TOP TOP 1:100152282-CT [A/G] TOP BOT 1:100154376-GA [T/C] BOT TOP 1:100154844-CA [T/G] BOT TOP 1:100155035-AC [A/C] TOP TOP 1:100155084-CT [T/C] BOT BOT 1:100182985-CA [A/C] TOP TOP 1:100183042-AG [T/C] BOT TOP 1:100185177-GT [A/C] TOP BOT

I'll appreciate you help.

Thanks, Cristian

kaichop commented 2 years ago

The software cannot fill a missing allele. I suggest that you replace the zero allele manually (if you know what is the correct allele), so simply delete these alleles from your file before analysis.

On Tue, Feb 8, 2022 at 1:35 AM Cristian Gonzalez-Colin < @.***> wrote:

Hi WGlab,

I'm using convert_bim_allele.pl to fill the missing alleles in my data with the next command. convert_bim_allele.pl --replacezero --intype top genotypes.clean.bim OMNI_chip.snptable --outfile output_fill.bim But I'm getting this error: NOTICE: The default --outtype of 'dbsnp' is assumed as output format NOTICE: Reading SNP Table file OMNI_chip.snptable ... Done with 1705969 SNPs NOITCE: 23612 insertion/deletion polymorphism are annotated in OMNI_chip (examples:1:100316615-CAG-C,1:100336041-TAGAC-T,1:100379098-GT-G) NOTICE: The new bim file will be written to output_fill.bim ... FATAL ERROR: the minor allele for SNP JHU_1.17537 is C but major allele is a zero allele in BIM file genotypes.clean.bim

It stops at the first allele. I don't know if changing the order of my alleles will help of if I'm missing something. My inputs look like this: BIM file: 1 JHU_1.17537 0 17538 C 0 1 JHU_1.54675 0 54676 T 0 1 JHU_1.56018 0 56019 T 0 1 JHU_1.61461 0 61462 A T 1 JHU_1.66161 0 66162 A 0 1 JHU_1.84138 0 84139 A 0 1 JHU_1.88337 0 88338 A G 1 JHU_1.91535 0 91536 G T 1 JHU_1.91580 0 91581 G A SNPTable: Name SNP ILMN Strand Customer Strand 1:10002775-GA [A/G] TOP TOP 1:100152282-CT [A/G] TOP BOT 1:100154376-GA [T/C] BOT TOP 1:100154844-CA [T/G] BOT TOP 1:100155035-AC [A/C] TOP TOP 1:100155084-CT [T/C] BOT BOT 1:100182985-CA [A/C] TOP TOP 1:100183042-AG [T/C] BOT TOP 1:100185177-GT [A/C] TOP BOT

I'll appreciate you help.

Thanks, Cristian

— Reply to this email directly, view it on GitHub https://github.com/WGLab/GenGen/issues/11, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNG3ODKBPSU7VFZSLVNPTTU2C2SXANCNFSM5NZQQI5Q . You are receiving this because you are subscribed to this thread.Message ID: @.***>

cristian2420 commented 2 years ago

Hi Kai,

Thanks for your early response.

I was confused because the documentation said: Sometimes the BIM file contains only one allele for a SNP, since the other allele is never observed in genotype data. The missing allele is shown as "0" in the BIM file (fourth column, since it denotes minor allele). For example, the corresponding line in the BIM file might be "2 rs231804 0 204416891 0 A ", indicating that only allele T is observed in the genotype data. If --fillzero argument is set , the missing allele will be filled. For example, the aformentioned example will become "2 rs231804 0 204416891 C T"

But I didn't find that option in the tool. Does that mean --fillzero argument was deprecated?

kaichop commented 2 years ago

Perhaps I implemented this functionality many years ago but I forgot about it. Please just try --fillzero and see what happens. Your previous command did not include this argument.

On Tue, Feb 8, 2022 at 8:59 AM Cristian Gonzalez-Colin < @.***> wrote:

Hi Kai,

Thanks for your early response.

I was confused because the documentation said: Sometimes the BIM file contains only one allele for a SNP, since the other allele is never observed in genotype data. The missing allele is shown as "0" in the BIM file (fourth column, since it denotes minor allele). For example, the corresponding line in the BIM file might be "2 rs231804 0 204416891 0 A ", indicating that only allele T is observed in the genotype data. If --fillzero argument is set , the missing allele will be filled. For example, the aformentioned example will become "2 rs231804 0 204416891 C T"

But I didn't find that option in the tool. Does that mean --fillzero argument was deprecated?

— Reply to this email directly, view it on GitHub https://github.com/WGLab/GenGen/issues/11#issuecomment-1032638905, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNG3OER54TX572IX7ZEO7DU2EOTBANCNFSM5NZQQI5Q . You are receiving this because you commented.Message ID: @.***>

cristian2420 commented 2 years ago

Yes, I tried it but this option does not exist, that's why I tried with replacezero argument. I thought It would have the same behavior.