Open jinghuazhao opened 2 months ago
Thank you for the heads up. I need to fix the documentation but to generate the sites only file, this command will do (but it will take a long time):
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20220422_3202_phased_SNV_INDEL_SV/1kGP_high_coverage_Illumina.chr{{1..22}.filtered.SNV_INDEL_SV_phased_panel,X.filtered.SNV_INDEL_SV_phased_panel.v2}.vcf.gz
for chr in {1..22} X; do
if [ $chr == "X" ]; then sfx=".v2"; else sfx=""; fi
bcftools view --no-version -Ou -c 2 1kGP_high_coverage_Illumina.chr$chr.filtered.SNV_INDEL_SV_phased_panel$sfx.vcf.gz | \
bcftools annotate --no-version -Ou -x ID,QUAL,FILTER,^INFO/AC,^INFO/AN,^INFO/END,^FMT/GT | \
bcftools sort -o 1kGP_high_coverage_Illumina.chr$chr.bcf -Ob -T ./bcftools. --write-index
done
bcftools concat --no-version -Ou 1kGP_high_coverage_Illumina.chr{{1..22},X}.bcf | \
bcftools view --no-version -G -Ob -o 1kGP_high_coverage_Illumina.sites.bcf --write-index
You don't need to store files in the $HOME/GCRh38
directory as any directory of your choice will do. This directory is in the documentation but it is not in the plugins code
As for the *.so files, say you want to run score.so located in the directory /path/to/bcftools/plugins
, you can run it as follows (four different options):
export BCFTOOLS_PLUGINS=/path/to/bcftools/plugins && bcftools +score
export BCFTOOLS_PLUGINS=/path/to/bcftools/plugins && bcftools plugin score
bcftools +$BCFTOOLS_PLUGINS/score.so
bcftools plugin $BCFTOOLS_PLUGINS/score.so
You can also compile BCFtools with a default plugin directory which will be hard coded in the binary, but this is not required
Thanks for the feedback -- indeed it appears possible to redinfe HOME to make install there. I did this in a rush without much thinking and I will try your script possibly in SLURM.
Also for information, I have my module file listed below -- it does appear to work.
> #%Module -*- tcl -*-
> ##
> ## modulefile
> ##
> proc ModulesHelp { } {
>
> puts stderr "\tbcftools - utilities for variant calling and manipulating VCFs and BCFs\n"
> puts stderr "\tInstalled under: /usr/local/Cluster-Apps/ceuadmin/bcftools/1.20"
> puts stderr "\tHomepage: https://www.htslib.org/download/"
>
> }
>
> module-whatis "utilities for variant calling and manipulating VCFs and BCFs."
>
> set root /usr/local/Cluster-Apps/ceuadmin/bcftools/1.20
> prepend-path PATH $root/bin
> prepend-path INCLUDE $root/include
> prepend-path MANPATH $root/share/man
> prepend-path LD_LIBRARY_PATH $root/lib:$root/libexec/bcftools
> prepend-path LIBRARY_PATH $root/lib:$root/libexec/bcftools
>
> setenv BCFTOOLS_PLUGINS $root/score
>
Actually setenv in module definition is equivalent to your export under Bash, so it is as indicated. The script for 1kGP_high_coverage_Illumina.sites.bcf generation took ~6hr on our system., which is presumbly (?) used in
cftools norm --no-version -Ou -m+ 1kGP_high_coverage_Illumina.sites.bcf | \ bcftools +liftover --no-version -Ou -- \ -s $public_databases/dbsnp/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna \ -f $public_databases/GRCh37_reference_fasta/hs1.fa \ -c $public_databases/dbsnp/hg38ToHs1.over.chain.gz | \ bcftools sort -o 1kGP_high_coverage_Illumina.sites.hs1.bcf -Ob --write-index
but I got error message: on AC and again warning if with (I reckon bcftools norm not applied properly)
bcftools norm --no-version -Ou -m+ 1kGP_high_coverage_Illumina.sites.bcf | \ bcftools annotate -x INFO/AC -Ou | \ bcftools +fill-tags -Ou -- -t AC | \ bcftools +liftover --no-version -Ou - -- \ -s $public_databases/dbsnp/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna \ -f $public_databases/GRCh37_reference_fasta/hs1.fa \ -c $public_databases/dbsnp/hg38ToHs1.over.chain.gz | \ bcftools sort -o 1kGP_high_coverage_Illumina.sites.hs1.bcf -Ob -T ./bcftools --write-index
and for the module file I need to use setenv BCFTOOLS_PLUGINS $root/score:$root/libexec/bcftools
If you used the command above to generate the 1kGP_high_coverage_Illumina.sites.bcf
file, then this should be a file without genotypes. You cannot fill the AC tags without genotypes, but if you followed the instructions I sent you should not have to as you should already have that tag. Also notice that bcftools +fill-tags -- -t AC
is terribly slow and bcftools +fill-AN-AC
is recommended instead
Thanks very much for the comments. I attempted to walk through the last chunk of your "Liftover VCFs" section, which is
bcftools norm --no-version -Ou -m+ 1kGP_high_coverage_Illumina.sites.vcf.gz | \ bcftools +liftover --no-version -Ou -- \ -s $HOME/GRCh38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna \ -f $HOME/hs1/hs1.fa \ -c $HOME/hs1/hg38ToHs1.over.chain.gz \ bcftools sort -o 1kGP_high_coverage_Illumina.sites.hs1.bcf -Ob --write-index
but went astray. As you can see that I am far from knowing bcftools enough, and therefor would try your notes above as well. BTW, the code you posted yesterday (wrapped in here, https://cambridge-ceu.github.io/csd3/applications/files/site.sb) did go through smoothly. I am not sure this route is valid at all as I got a warning in both cases,
bcftools norm --no-version -Ou -m+ ../1000G/bcftools/1kGP_high_coverage_Illumina.sites.bcf | \
> bcftools annotate -x INFO/AC -Ou | \
> bcftools +fill-AN-AC -Ou -- | \
> bcftools +liftover --no-version -Ou - -- \
> -s $public_databases/dbsnp/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna \
> -f $public_databases/GRCh37_reference_fasta/hs1.fa \
> -c $public_databases/dbsnp/hg38ToHs1.over.chain.gz | \
> bcftools sort -o 1kGP_high_coverage_Illumina.sites.hs1.bcf -Ob -T ./bcftools --write-index
Writing to ./bcftoolsxUm4P1
INFO/AC is handled by AC rule
Warning: input VCF includes symbolic alleles that might not properly lift over
Again, you don't need to use bcftools +fill-AN-AC
as it will not do anything on a file without genotypes. You are getting the INFO/AC is handled by AC rule
message only because the field is in the header, but it will never be present if you remove it first from a VCF without genotypes. As for the warning, that is due to DEL/DUP variants in the VCF which will not always liftover properly as BCFtools/liftover is designed for SNVs and indels. CNVs cannot be easily lifted over in all scenarios
It looks I need to slow down and digest what bcftools does in these cases. Thanks very much for your time..
Hi Giulio / All,
I tried to get +liftover going on our systems but found location of a couple of files needs to be detailed -- could anybody point to me?
They are 1kGP_high_coverage_Illumina.sites.vcf.gz,, hs1.fa -- I made attempt to replace them as in here, https://cambridge-ceu.github.io/csd3/applications/bcftools.html#120
As documented files are stored in $HOME/GCRh38 where our system has a limit on $HOME I tried to get them elswhere. In particular, the *.so files are with bin which appears to be available through LD_LIBRARY_PATH as well?
Your comments are greatly appreicated,
JIng Hua