BGI-shenzhen / LDBlockShow

LDBlockShow: a fast and convenient tool for visualizing linkage disequilibrium and haplotype blocks based on VCF files
MIT License
136 stars 40 forks source link

Too large svg graph to convert the format #4

Closed Yung-Chien closed 3 years ago

Yung-Chien commented 3 years ago

Hi @hewm2008 , I'd drawm a svg graph with a range of 1.5 Mb contig and more than 10k bi-allelic snps. While the program report an error like this: image

So how could I select the height value? Otherwise, a large chart convert program may need large memory , so how many memory should I to select in the perl script svg2xxx.pl? Thanks in anvances.

Best Regards, Yung-Chien

您好 @hewm2008 , 我在做一个大约长1.5Mb的ld Block分析,其中包含了10k多的bi-allelic snps。这种分析首先产生了这样的一个错误: 2021-02-22_200047

我该如何选择Height参数呢?此外,由于svg转png这一步会消耗大量内存,设置一个低ppi的话会不会缩短运行时间以及节省内存呢?谢谢您。 祝好, Yung-Chien

Yung-Chien commented 3 years ago

Hi @hewm2008 , I found a other issue just before. When the range is too large, althouge I used "-InGFF" and "-NoGeneName" paramater, the output graph didn't display the annotation information, like this: image What I expected is this: image So how could I fix this problem? And what the colors(pink,blue and grey) means in the snp color bar? I guess the pink means a snp located in a coding region, but it confused me what the other color refered to.Thanks in advanced. Best Regards, Yung-Chien

hewm2008 commented 3 years ago

For -InGFF : there are too many genes in the input region, the gene structure will not be displayed [-GeneLimtNum 30],you can set the parameter -GeneLimtNum larger,such as the [-GeneLimtNum 2000]; However, because the your input region is too large, gene structure may be drawn like a solid line

hewm2008 commented 3 years ago

For Big SVG file ---- > png 1) convert command is recommended to be pre-installed, although it is not required. If your system does not have a convert command, svg2xxx.pl will be called. you can use the follow command to pre-install the convert command; sudo apt-get install ImageMagick or sudo yum install ImageMagick 2) You can manually turn the png canvas(低ppi) through the following command(通过修改画布大小即可以修改ppi):

   ` perl bin/svg_kit/svg2xxx.pl       xxxx.svg  -t   png    -height 50` 
    _perl bin/svg_kit/svg2xxx.pl  -h_   to see more parameters, including modifying ppi  (**-height 50**)
  also can be ( **-resize   4096**) : 
 ` convert  -resize   4096     xxxx.svg  xxxx.png`

 But it seems that the memory cannot be significantly reduced. Later, I have time to analyze and fix the initial canvas to reduce memory, instead of setting the canvas according to the snp size 。 后面有机会我将初始化画图的一开始的画布,从而减少内存,而不是由snp多少决定画布,这样才能减少内存 

3)Another way to reduce the memory: since the large number of sites call a large memory ,you can be randomly selected every xx bps.

hewm2008 commented 3 years ago

@Yung-Chien 从截图上看并好像是 -h不能是小数,我再看了一下代码(ShowLDSVG:Line 1666行),是已经取整了,是不是你没有用最新版本? 建议用最新的版本 or 你的手动运行一下 -height 50

perl bin/svg_kit/svg2xxx.pl xxxx.svg -t png -height 50

From your screenshot, it seems that -h cannot be a floating point number. I took a look at the latest code(ShowLDSVG:Line 1666) again and it has been rounded up. Did you not use the latest version?

It is recommended to use the latest version or run it manually -height 50

Yung-Chien commented 3 years ago

Hi @hewm2008 , Thanks for your reply, I solved the problem of -GeneLimtNum . And I found that the height issue seems not affect the program of plotting. Then I ignored it. But I still wonder what the color mean in the snp color bar(navy-blue,grey and pink) as below : image Best Regards, Yung-Chien

hewm2008 commented 3 years ago

@Yung-Chien 1 Use the latest version (>=1.37) will solve your problem ; also the following command : perl bin/svg_kit/svg2xxx.pl xxxx.svg -t png -height 50 perl bin/svg_kit/svg2xxx.pl -h to see more parameters, including modifying ppi (-height 50) also can be ( -resize 4096) : convert -resize 4096 xxxx.svg xxxx.png

2 the col mean can see from more help LDBlockShow-*/bin/ShowLDSVG -h

-crGene       <s>  : InColor for Gene Stuct [CDS:Intron:UTR:Intergenic] 
                                default ['#e7298a:lightblue:#7570b3:#a6cee3']

By the way to change the SpeSite col can be found -crTagSNP <s> : Color for TagSNP [31,120,180]

Yung-Chien commented 3 years ago

@hewm2008 ,

Oh yes I Found the parameter in the tutorial PDF, this is my personal negligence. Here is a final question, after finishing the analysis, I got a block like this: image Could I quickly get the start and end position from the output files? It seems strange that I can not find the block length although the Documentation mentioned you can get the block_length in the "out.blocks.gz" file as below, image Well here is my output block file, image Thanks for your patient, Best Regards, Yung-Chien

Yung-Chien commented 3 years ago

@hewm2008 , Oh I think I know the reason why the block length information disappered. It may be caused by the Blocktype parameter, I selected the type 2 and this column did not show.

hewm2008 commented 3 years ago

There are start and end points in the file, you can use these two coordinates to correlate to get the block length . zcat chrXXX.blocks.gz| awk '$3-$2>1000' |less -S

hewm2008 commented 3 years ago

@Yung-Chien the 1.38 ver A. The 1.38 released today will already limit the largest canvas near (-ResizeH) B. perl ShowLDSVG -h and perl ShowLDSVG -MoreHelp you can see more help and para

Yung-Chien commented 3 years ago

@hewm2008 , Oh thanks u very much for updating a new version.

hewm2008 commented 3 years ago

The program recommends that the number of snp should not exceed 1w, and it is best to control it at 5k sites Here I provide a script that randomly selects an snp every XXX bps 程序建议snp数不要超1w个,最好控制在5k个位点,在这我提供了一个每多少bp就随机挑一个snp的脚本:


#!/usr/bin/perl -w
use strict;

die  "Version 1.0\t2020-11-05;\nUsage: $0 \n" unless (@ARGV ==2);

my $InFile=$ARGV[0] ;
if  ($InFile =~s/\.gz$/\.gz/)
{
     open IA,"gzip -cd  $InFile | "  || die "input file can't open $!" ;
}
else
{
     open IA,"$InFile"  || die "input file can't open $!" ;
}
open (OA,">$ARGV[1]") || die "output file can't open $!" ;

my %hash=();
my $bin=5000;         #  可以修改 控制在1w左右 
my $column=1; 
my  $Num=1;
    while( < IA > )      ##    Remove spaces   IA    
    { 
        chomp ;
        if  ($_=~s/#/#/) 
        {
            print OA  $_,"\n";
                        next ;
        }
        my @inf=split ;
        my $site=int($inf[$column]/$bin);
        my $key=$inf[0]."_".$site;
        if (!exists $hash{$key})
        {
            print OA $_,"\n";
            $hash{$key}=1;
        }
        else
        {
            $hash{$key}++;
            if ($hash{$key}<=$Num)
            {
                print OA $_,"\n";
            }
        }

    }
close IA;
close OA ;

######################swimming in the sky and flying in the sea ##########################