BGI-shenzhen / LDBlockShow

LDBlockShow: a fast and convenient tool for visualizing linkage disequilibrium and haplotype blocks based on VCF files
MIT License
134 stars 40 forks source link

Parameter -Region (format chr:start:end) always incur warnings: can't be found in the SNP dataset #19

Closed pwu4 closed 2 years ago

pwu4 commented 2 years ago

LDBlockShow is an excellent tool, but ,

In the tutorial instruction (3.1.1 Main parameter), I would suggest a minor correction:

-Region, format should be chr:start:end, a prefix "chr" before the chromosome-interger should be unnecessary which depends on your chromosome variable format (chr1 or 1) in the vcf dataset.

Where it says:

3.1.1 Main parameter ./bin/LDBlockShow

    Usage: LDBlockShow  -InVCF  <in.vcf.gz>  -OutPut <outPrefix>  -Region  chr1:10000-20000

Details for above parameters:

 -Region         The defined region to show the LD heatmap (format: chr:start:end)

Reason: "-Region chr11:24100000:24200000" indeed works fine with the example, but more often in GWAS summary statistics, chromosome column, people use a single interger (i.e. 11) instead of format like chr11. which is particularly true for 1000Genomes reference data and then: "-Region 11:24100000:24200000" should be used to avoid warnings.

See also: https://www.internationalgenome.org/category/vcf/ Please note that all our VCF files using straight intergers and X/Y for their chromosome names in the Ensembl style rather than using chr1 in the UCSC style. If you request a subsection of a vcf file using a chromosome name in the style chrN as shown below it will not work.

Otherwise, you got warnings look like:

Detected VCF File is phased file with '|', Read VCF in Phase mode

    InPut Para -Region  chromosome [chr16]  can't be found in the SNP dataset

I ran across this problem and tried to fixed it with checking my query and "SNP datasets" for a whole day. Only to find out the real problem was that I should not have followed the "format instruction" like "-Region chr16:169708:1169708" Instead, just "-Region 16:169708:1169708" will work well.

hewm2008 commented 2 years ago

Dear @pwu4 Thank you for your suggestion. I will update the readme description in a later updated version. The input chr name should be the same with the VCF files . also the chr name of gwas file should be the same .

pwu4 commented 2 years ago

Appreciate it.