Closed rashindrie closed 2 years ago
If the command bcftools query -f '%CHROM\t%POS\t%REF\n' $dir/$pfx.bdev.bcf --samples 205058610011_R03C02
does not output anything to terminal, it seems like your VCF is empty, despite having an header.
Oh interesting.. I will look into why that's happening.
Thanks :)
Hi @freeseek ,
So I think I am doing something wrong when creating the $dir/$pfx.bdev.bcf
.
For the input-stats
file I modified the tsv
file generated from gtc2vcf
. The original file didn't have a sample_id column so I added one.
sample_id gtc number_snps ploidy ploidy_type sample_name sample_plate sample_well cluster_file snp_manifest scan_date autocall_date autocall_version number_normalization_transforms number_x_controls number_y_controls scanner_name pmt_green pmt_red scanner_software_version scanner_username call_rate computed_gender logr_deviation gencall_score_10_percentile dx gencall_score_50_percentile number_valid_calls number_invalid_calls number_intensity_only_or_zeroed_loci p05_x p50_x p95_x p05_y p50_y p95_y sentrix_barcode
205058610011_R07C01 205058610011_R07C01.gtc 654027 1 0 GSA-24v3-0_A1_ClusterFile.egt GSA-24v3-0_A2.bpm 3/24/2021 5:25:03 PM 09/05/2021 4:02 AM 3.0.0 54 92 92 N371 0 0 4.0.0.147 Illumina 0.999456 M 0.186969 0.477224 0 0.812637 649827 354 3846 819 4725 19336 182 2241 4919 205058610011
205058610011_R03C02 205058610011_R03C02.gtc 654027 1 0 GSA-24v3-0_A1_ClusterFile.egt GSA-24v3-0_A2.bpm 3/24/2021 5:31:45 PM 09/05/2021 4:02 AM 3.0.0 54 92 92 N371 0 0 4.0.0.147 Illumina 0.999339 M 0.193263 0.476451 0 0.812404 649751 430 3846 784 4156 17618 178 1816 3948 205058610011
205058610011_R05C02 205058610011_R05C02.gtc 654027 1 0 GSA-24v3-0_A1_ClusterFile.egt GSA-24v3-0_A2.bpm 3/24/2021 5:33:23 PM 09/05/2021 4:01 AM 3.0.0 54 92 92 N371 0 0 4.0.0.147 Illumina 0.999494 M 0.188625 0.477046 0 0.812617 649852 329 3846 796 4470 18486 174 2059 4466 205058610011
....
Command to generate $dir/$pfx.bdev.bcf
.
assembly="GRCh38"
mhc_reg="chr6:27518932-33480487"
kir_reg="chr19:54071493-54992731"
cnp="$HOME/GRCh38/cnps.bed"
dup="$HOME/GRCh38/segdups.bed.gz"
cyto="$HOME/GRCh38/cytoBand.txt.gz"
bcftools +mocha --genome $assembly --input-stats $tsv --no-version --output-type b --output $dir/$pfx.bdev.bcf --variants ^$dir/$pfx.xcl.bcf --calls $dir/$pfx.calls.tsv --stats $dir/$pfx.stats.tsv --ucsc-bed $dir/$pfx.ucsc.bed --cnp $cnp --mhc $mhc_reg --kir $kir_reg $dir/$pfx.bcf && bcftools index -f $dir/$pfx.bdev.bcf |& tee -a output_alt.txt
Output:
MoChA 2021-06-01 https://github.com/freeseek/mocha
Genome reference: GRCh38
Variants to exclude: /home/ubuntu/snp_analysis/mocha//mocha_.xcl.bcf
Regions to genotype: /root/GRCh38/cnps.bed
Genome-wide statistics: /home/ubuntu/snp_analysis/bcf/ILGSA_2.tsv
MHC region to exclude from analysis: chr6:27518932-33480487
KIR region to exclude from analysis: chr19:54071493-54992731
BAF deviations for LRR+BAF model: -2.0,-4.0,-6.0,10.0,6.0,4.0
BAF deviations for BAF+phase model: 6.0,8.0,10.0,15.0,20.0,30.0,50.0,80.0,100.0,150.0,200.0
Minimum number of genotypes for a cluster to median adjust BAF and LRR: 5
Minimum number of genotypes for a cluster to regress BAF against LRR: 15
Order of polynomial in local GC content to be used to regress LRR against GC: 2
Major transition phred-scaled likelihood: 65.00
Minor transition phred-scaled likelihood: 35.00
Autosomal telomeres phred-scaled likelihood: 20.00
Chromosome X telomeres phred-scaled likelihood: 8.00
Chromosome Y telomeres phred-scaled likelihood: 6.00
Uniform error phred-scaled likelihood: 15.00
Phase flip phred-scaled likelihood: 20.00
List of short arms: 13,14,15,21,22,chr13,chr14,chr15,chr21,chr22
Use variants in short arms: FALSE
Use variants in centromeres: FALSE
Use chromosomes without centromere rules: FALSE
Relative contribution from LRR for LRR+BAF model: 0.20
Difference between LRR for haploid and diploid: 0.45
Using genome assembly from GRCh38
Loading 24 sample(s) from the VCF file
Warning: --regress-BAF-LRR 15 regression will not be effective with only 24 sample(s)
Read 47739 variants from contig chr1
Read 51269 variants from contig chr2
Read 41484 variants from contig chr3
Read 37373 variants from contig chr4
Read 35551 variants from contig chr5
Read 34290 variants from contig chr6
Read 33195 variants from contig chr7
Read 30493 variants from contig chr8
Read 25670 variants from contig chr9
Read 29653 variants from contig chr10
Read 30435 variants from contig chr11
Read 28797 variants from contig chr12
Read 22011 variants from contig chr13
Read 19349 variants from contig chr14
Read 19351 variants from contig chr15
Read 20008 variants from contig chr16
Read 19065 variants from contig chr17
Read 17016 variants from contig chr18
Read 14448 variants from contig chr19
Read 14385 variants from contig chr20
Read 8008 variants from contig chr21
Read 7869 variants from contig chr22
Read 26956 variants from contig chrX
Read 4126 variants from contig chrY
Read 1137 variants from contig chrM
Estimated 0 sample(s) of unknown gender, 23 male(s) and 1 female(s)
Read 47739 variants from contig chr1
Written 0 variants for contig chr1
Read 51269 variants from contig chr2
Written 0 variants for contig chr2
Read 41484 variants from contig chr3
Written 0 variants for contig chr3
Read 37373 variants from contig chr4
Written 0 variants for contig chr4
Read 35551 variants from contig chr5
Written 0 variants for contig chr5
Read 34290 variants from contig chr6
Written 0 variants for contig chr6
Read 33195 variants from contig chr7
Written 0 variants for contig chr7
Read 30493 variants from contig chr8
Written 0 variants for contig chr8
Read 25670 variants from contig chr9
Written 0 variants for contig chr9
Read 29653 variants from contig chr10
Written 0 variants for contig chr1
There are two troublesome lines:
1. Estimated 0 sample(s) of unknown gender, 23 male(s) and 1 female(s)
2. Written 0 variants for contig chr1
This should not happen ideally right?
It is very weird that no variants are written out in the output. That should not happen. I will have to check, I am not sure what could be causing this. Can you try the following:
bcftools index --stats $dir/$pfx.bdev.bcf
bcftools index --stats $dir/$pfx.xcl.bcf
Hi,
first command does not output anything..
root@ip-172:/home/ubuntu/analysis# bcftools index --stats $dir/$pfx.bdev.bcf
root@ip-172:/home/ubuntu/analysis# bcftools index --stats $dir/$pfx.xcl.bcf
chr1 248956422 1328
chr2 242193529 890
chr3 198295559 687
chr4 190214555 610
chr5 181538259 581
chr6 170805979 603
chr7 159345973 794
chr8 145138636 522
chr9 138394717 508
chr10 133797422 991
chr11 135086622 690
chr12 133275309 527
chr13 114364328 478
chr14 107043718 313
chr15 101991189 552
chr16 90338345 779
chr17 83257441 479
chr18 80373285 343
chr19 58617616 1145
chr20 64444167 170
chr21 46709983 405
chr22 50818468 1806
chrX 156040895 599
Q1: Getting Execution halted in
mocha_plot.R
scriptCommand:
Error
Troubleshooting: Viewing contents in
$dir/$pfx.bdev.bcf
Output
$dir/$pfx.bdev.bcf
is not empty, running below command does not output anything to the terminal.. I think this is why in the first command I get an error. Can you please have a look at this?Q2: From where can I get the
age_plot.R
script?Thanks, Rashindrie