BGI-shenzhen / LDBlockShow

LDBlockShow: a fast and convenient tool for visualizing linkage disequilibrium and haplotype blocks based on VCF files
MIT License
134 stars 40 forks source link

SNP number is not correct #7

Closed huw143 closed 3 years ago

huw143 commented 3 years ago

Hi, I have located a range in VCF file with eight snp number, but the LDBlockShow did not detect these SNP numbers and said there is no snp markers among the range I defined. do you know the reason why?

Thanks a lot for your help.

Hu

hewm2008 commented 3 years ago

Dear @huw143 you can see more help (SNP filter parameters) by follow command: ./bin/LDBlockShow -h The 8 SNPs after filtering as follows "-MAF 0.05 -Het 0.90 -Miss 0.25" remain 0 SNPs . you can modify these three parameters according to your own situation.

Other info : A :The soft will skip non bi-allelic(Singleton/ThreeMulti allelic) site and indel site (For bi-allelic indel, you can modify the "REF ALT" to only one base. see the Perl script at the end of this conversation B: I guess it is caused by the wrong region of ​​your input. Someone often make mistakes in the upper or lower characters of chr name [chr3/ Chr3 / 3/], please make sure that the chr name you input is consistent with vcf input file ,If the chr name is the same, you can enlarge the region ,so that there is a certain amount of snp in this region.

huw143 commented 3 years ago

Thanks a lot for your quick reply, appreciate your help. Now I have another question, please see the attachment for details. I am not sure why it appeared like this, since there are actually 8 SNP in this defined region. Capture

hewm2008 commented 3 years ago

A : at 914 lines is open the gwas file : open (IE,"$InGWAS") || die "input file can't open $!"; Do you have really have the file named gwas.pvalue ?

B :Genotype miss too much , it can not run the "LD blocks" funtion ; but the LDheatmap/gwas is ok

huw143 commented 3 years ago

I see that, thanks a lot

hewm2008 commented 3 years ago

I provide a script here that can change bi-allelic indel to SNP, which can be reserved for calculation

#!/usr/bin/perl -w
use strict;
#Version 1.0    hewm2008@qq.com
die  "Version 1.0\t2021-04-23;\nUsage: $0  InPut.vcf   OutPut.vcf   \n" unless (@ARGV ==2);

#############Befor  Start  , open the files ####################

my $InPutFile=$ARGV[0];
if ($InPutFile=~s/.gz$/.gz/g)
{
    open (IA,"gzip -cd $InPutFile | ") || die "input file can't open $!";
}
else
{
    open (IA,"$InPutFile") || die "input file can't open $!";
}

open (OA,">$ARGV[1]") || die "output file can't open $!" ;

################ Do what you want to do #######################

while(<   IA   >)        ##  IA 前后空格删除掉 
{ 
    chomp ;
    if ($_=~s/#/#/)
    {
        print OA  $_,"\n";
        next;
    }
    else
    {
        my @inf=split ;
        if ($inf[3]=~/,/  ||  $inf[4]=~/,/ )
        {
        }
        else
        {
            if (length($inf[3])>1)  {$inf[3]=substr($inf[3],0,1);   }
            if (length($inf[4])>1)  {$inf[4]=substr($inf[4],0,1);   }
            $_=join("\t",@inf);         print OA "$_\n";
        }
    }
}
close IA;
close OA ;
######################swimming in the sky and flying in the sea ###########################