BGI-shenzhen / LDBlockShow

LDBlockShow: a fast and convenient tool for visualizing linkage disequilibrium and haplotype blocks based on VCF files
MIT License
134 stars 40 forks source link

Specific SNPS? #30

Closed andrewjordank closed 1 year ago

andrewjordank commented 1 year ago

Hello! I have been having a fun time learning how to use your program. I am trying to make a pairwise plot for specific SNP's in the ABO gene on chromosome 9. So far I've been able to make the pairwise plot for the entire gene region which includes many polymorphisms. I used the -SepSNPName command to highlight my specific SNPS. Is there any way that I can just use these specific SNPS to make the plot instead of including every polymorphism within my selected region?

andrewjordank commented 1 year ago

As an update I tried using plink to just extract the SNPS I was interested in and now I get this error: Use of uninitialized value in subtraction (-) at /home/shared/tools/LDBlockShow-1.40/bin//ShowLDSVG line 545, line 1. Use of uninitialized value in subtraction (-) at /home/shared/tools/LDBlockShow-1.40/bin//ShowLDSVG line 545, line 1. Use of uninitialized value in subtraction (-) at /home/shared/tools/LDBlockShow-1.40/bin//ShowLDSVG line 572, line 1. Use of uninitialized value in subtraction (-) at /home/shared/tools/LDBlockShow-1.40/bin//ShowLDSVG line 572, line 1.

hewm2008 commented 1 year ago

Hello, it is recommended to extract the site you want from the vcf file (Sep SNP), and then run this software with this new sub-vcf file

andrewjordank commented 1 year ago

I tried that but I get a divide by 0 error I pasted that error message above

hewm2008 commented 1 year ago

I feel that it is possible that the sites in the new sub-vcf have been filtered out (you can see the final sites involved in the calculation from the screen), you can adjust more SNPs in , or adjust -MAF 0.0001 -Miss 0.5 to rerun

andrewjordank commented 1 year ago

Thank you for your replies. As I follow up I just wanted to share what we have been trying to do to make sure we are not missing anything obvious. Our goal is to use a 1000 Genome VCF as the input to define a large locus that encompasses a single gene, and then only create a pairwise ld plot for specific SNPs of interest within that locus. We were able to first successfully run the 1000g file and produce a pairwise ld plot for all the SNP across this locus. We did this by running the following: /home/shared/tools/LDBlockShow-1.40/bin/LDBlockShow -InVCF /Data/home/1kGP_high_coverage_Illumina.chr9.filtered.SNV_INDEL_SV_phased_panel.vcf.gz -OutPut /Data/home/LDBlock -Region chr9:133247534-133278152 -OutPdf -ShowNum -SeleVar 2 -SpeSNPName /Data/home/O_Snps_Positions.txt -InGFF /Data/home/ABOGene.GFF3

LDBlock.pdf

To then try to restrict the plot to just our specific SNPs of interest, we tried to subset the VCF file using PLINK to just include our SNPs of interest by running the following: /home/shared/tools/LDBlockShow-1.40/bin/LDBlockShow -InVCF /Data/home/O_Snps_chr9_1kgp_high_coverage.vcf -Region chr9:133247534-133278152 -OutPut /Data/home/Test_3 -OutPdf -ShowNum -SeleVar 2 -SpeSNPName /Data/home/O_Snps_Positions.txt

However, this didn’t work. Also, we were worried that the r2 calculations would be misinformed without the complete dataset. I now get this error message.

InPut Para -Region chromosome [chr9] can't be found in the SNP dataset Warining: SVG module in Perl is missing, trying to loading the built-in [SVG.pm]... Loading SVG module done Start draw... SVG info: SNPNumber :0 , SVG (width,height) = (0,0) Use of uninitialized value in subtraction (-) at /home/shared/tools/LDBlockShow-1.40/bin//ShowLDSVG line 545, line 1. Use of uninitialized value in subtraction (-) at /home/shared/tools/LDBlockShow-1.40/bin//ShowLDSVG line 545, line 1. Use of uninitialized value in subtraction (-) at /home/shared/tools/LDBlockShow-1.40/bin//ShowLDSVG line 572, line 1. Use of uninitialized value in subtraction (-) at /home/shared/tools/LDBlockShow-1.40/bin//ShowLDSVG line 572, line 1. Illegal division by zero at /home/shared/tools/LDBlockShow-1.40/bin//ShowLDSVG line 572, line 1.

O_Snps_chr9_1kgp_high_coverage.vcf.gz

I have included the picture of the plot I was able to create and the zipped sub-VCF file

hewm2008 commented 1 year ago

I found the reason, because the name of the chr you extracted has changed, the original chr name is chr9, and now it has become 9.

So your command needs to be changed to this (chr9--->9): /home/shared/tools/LDBlockShow-1.40/bin/LDBlockShow -InVCF /Data/home/O_Snps_chr9_1kgp_high_coverage.vcf -Region 9:133247534-133278152 -OutPut /Data/home/Test_3 -OutPdf -ShowNum -SeleVar 2 -SpeSNPName /Data/home/O_Snps_Positions.txt

andrewjordank commented 1 year ago

Thank you so much for all your help and your replies! This worked!