freeseek / gtc2vcf

Tools to convert Illumina IDAT/BPM/EGT/GTC and Affymetrix CEL/CHP files to VCF
MIT License
131 stars 22 forks source link

vcf files not being saved #47

Closed mahaaamir closed 1 year ago

mahaaamir commented 1 year ago

I have approximately 4000 gtcs that I am trying to convert to vcf files using the gtc2vcf plugin but even though the script reads gtcs correctly and writes the vcf file - no output is produced. I have tried to run it by reducing the number of gtcs to 8 and get the same result. I get this output; Writing to ./bcftools-sort.XXXXXXMMTHoa gtc2vcf 2022-01-12 https://github.com/freeseek/gtc2vcf Reading BPM file /bochica/shared/numom/raw_babies/GUER_20211019_MEGA_1001_1002/Multi-EthnicGlobal_D2.bpm Reading EGT file /bochica/shared/numom/raw_babies/GUER_20211019_MEGA_1001_1002/Multi-EthnicGlobal_D1_ClusterFile.egt Reading GTC file /home5/maamir/mfgitry/somegtc/206043240081_R02C01.gtc Reading GTC file /home5/maamir/mfgitry/somegtc/206043240081_R07C01.gtc Reading GTC file /home5/maamir/mfgitry/somegtc/206043240081_R06C01.gtc Reading GTC file /home5/maamir/mfgitry/somegtc/206043240081_R01C01.gtc Reading GTC file /home5/maamir/mfgitry/somegtc/206043240081_R08C01.gtc Reading GTC file /home5/maamir/mfgitry/somegtc/206043240081_R03C01.gtc Reading GTC file /home5/maamir/mfgitry/somegtc/206043240081_R05C01.gtc Reading GTC file /home5/maamir/mfgitry/somegtc/206043240081_R04C01.gtc Writing VCF file Lines total/missing-reference/skipped: 1748250/23814/14885 Merging 2 temporary files Cleaning Lines total/split/realigned/skipped: 1733365/0/0/23817

But no sub directory of bcftools-sort.XXXXXXMMTHoa is present in my directory when the programme has stopped running.

Below is the code I am using -

ref="/home5/maamir/GRCh38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna" bcftools +gtc2vcf --no-version -Ou --bpm /bochica/shared/numom/raw_babies/GUER_20211019_MEGA_1001_1002/Multi-EthnicGlobal_D2.bpm --egt /bochica/shared/numom/raw_babies/GUER_20211019_MEGA_1001_1002/Multi-EthnicGlobal_D1_ClusterFile.egt --gtcs /home5/maamir/mfgi --fasta-ref $ref --extra $out_prefix.tsv | bcftools sort -Ou -T ./bcftools-sort.XXXXXX | bcftools norm --no-version -Ob -c x -f $ref && \ bcftools index --f $out_prefix.bcf

freeseek commented 1 year ago

The ./bcftools-sort.XXXXXX directory is just a temporary directory needed to sort the output VCF and it should be automatically removed at the end of the sorting. Looking at your command, it seems as if you are outputting the VCF to standard output with the bcftools norm command. You should either use the -o $out_prefix.bcf option in bcftools norm or pipe the standard output to the tee command as explained in the documentation to allow generation of the output VCF and indexing running concurrently

mahaaamir commented 1 year ago

changing the command to the one provided in the documentation is giving me a similar result- with 341 bcfs being produced temporarily and the ./bcftools-sort.XXXXXX directory being removed, in the log I can see the vcf(s) is being written- but I can locate no vcf output in my home directory. I used this command as provided in documentation- #!/bin/bash export PATH="$HOME/bin:$PATH" export BCFTOOLS_PLUGINS="$HOME/bin" bpm_manifest_file="/bochica/shared/numom/raw_babies/GUER_20211019_MEGA_1001_1002/Multi-EthnicGlobal_D2.bpm" egt_cluster_file="/bochica/shared/numom/raw_babies/GUER_20211019_MEGA_1001_1002/Multi-EthnicGlobal_D1_ClusterFile.egt" path_to_gtc_folder="/home5/maamir/mfgi" ref="/home5/maamir/GRCh38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna" out_prefix="..." bcftools +gtc2vcf \ --no-version -Ou \ --bpm $bpm_manifest_file \ --egt $egt_cluster_file \ --gtcs $path_to_gtc_folder \ --fasta-ref $ref \ --extra $out_prefix.tsv | \ bcftools sort -Ou -T ./bcftools-sort.XXXXXX | \ bcftools norm --no-version -Ob -c x -f $ref | \ tee $out_prefix.bcf | \ bcftools index --force --output $out_prefix.bcf.csi

freeseek commented 1 year ago

You need to give a value to the out_prefix variable otherwise, like you wrote it down, you will generate the files ....bcf and ....bcf.csi which are likely in your directory now, but you cannot see them unless you use ls -a as files with filenames that start with the . character are hidden by default