BGI-shenzhen / LDBlockShow

LDBlockShow: a fast and convenient tool for visualizing linkage disequilibrium and haplotype blocks based on VCF files
MIT License
136 stars 40 forks source link

multithread option? #15

Closed wbsimey closed 3 years ago

wbsimey commented 3 years ago

Hello, I have a large vcf (27 chromosomes (2.5Gb), 96 diploid samples, and >18 million SNPs) and I have done a fair amount of pre-filtering. I am testing LDBlockShow on our smallest chromosome (18Mb and >500,000 SNPs).

I ran:

LDBlockShow -InVCF Chr26_2pops_HW.vcf.gz -OutPut out -Region Chr26_nm_RagTag:1:18055847 -OutPng -SeleVar 2

I am getting the following warning and no command line prompt after 3 hours.

#Warning skip non bi-allelic(Singleton/ThreeMulti allelic) site, and total skip allelic sites number is :28792
#Warning skip high missing site, and total skip allelic sites number is :43138
#Warning skip low Minor Allele Frequency site, and total skip allelic sites number is :212107
##Start Region Cal...    :Chr26_nm_RagTag 1 18055847; In This Region TotalSNP Number is 239684
Warning: LDBlocks Region SNP Number too much, you may use the small region or more stringent conditions to filter the SNP

I can see from running 'top' that LDBlockShow is running and only using a single thread. I have two questions:

  1. Should the process abort on these warnings? or is it going to complete its run and provide output files?
  2. Is there a way to use multiple threads? I have 256 threads and 2 TB RAM.
wbsimey commented 3 years ago

I selected a fraction of the total SNPs, 500k down to 50k and LDBlockShow was able to complete and generate plots, but I get the message:

& LDBlockShow -InVCF Neotoma_Chr26_2pops_HW_FLT_QUAL20_DP20-e_dblPipe.vcf.gz -OutPut Neo96_Chr26_bry_LD  -Region Chr26_nm_RagTag:1:18055847 --SubPop bry_pop_list.txt -OutPng -SeleVar 2

the Number of subPop samples[found in VCF] is 69
#Warning skip non bi-allelic(Singleton/ThreeMulti allelic) site, and total skip allelic sites number is :28413
#Warning skip high missing site, and total skip allelic sites number is :4407
#Warning skip low Minor Allele Frequency site, and total skip allelic sites number is :10801
##Start Region Cal...    :Chr26_nm_RagTag 1 18055847; In This Region TotalSNP Number is 300
find blocks...
In Big SNP Number :300 ,Para -NumGradien suggest be maxValue : 32 ,auto be it
Start draw... SVG info: SNPNumber :300 , SVG (width,height) = (6900,5100)
        Tip: The region you inputed is greater than the Para [-NoShowLDist], and this will call LDheatmap to be  a not-complete triangle. You can modify the parameter [-NoShowLDist] according to your needs.
convert   SVG ---> PNG ...

I assume this message is regarding the ShowLDSVG command autorun by LDBlockShow. What can I do to correct this? You have informative error messages and you have some smart second attempts at running commands, such as: Can't find the [ convert ] bin in your $PATH, I try to convert svg by /home/bsimison/Projects/Neotoma/96_LC-genomes/LDBlockShow/bin/svg_kit/svg2xxx.pl

thank you!

Neo96_Chr26_bry_LD

hewm2008 commented 3 years ago

Dear @wbsimey

1 The program does not recommend using more than 10,000 SNP sites, so if you run the entire chr, you need to set the number of SNPs. It is recommended to randomly select a site every 100kb. I have already mentioned the procedure of randomly picking sites here

For Big SVG file ---- > png 1) convert command is recommended to be pre-installed, although it is not required. If your system does not have a convert command, svg2xxx.pl will be called. you can use the follow command to pre-install the convert command; sudo apt-get install ImageMagick or sudo yum install ImageMagick

3 The figure you drew is an incomplete inverted triangle, which has already reminded you : Tip: The region you inputed is greater than the Para [-NoShowLDist], and this will call LDheatmap to be a not-complete triangle. You can modify the parameter [-NoShowLDist] according to your needs. so you just to replot the figure by add para -NoShowLDist chrlengthNumber ./LDBlockShow-1.40/bin/ShowLDSVG -InPreFix Result.Frefix -NoShowLDist XXXX -CrGrid black -OutPut LDheatmap

wbsimey commented 3 years ago

Very helpful, thank you.