Closed hannanw closed 2 years ago
If you run the ascat.pl script the BAF values for the analysis are output in *.ascat_ngs.cn.tsv.gz
. Please note this package is intended for illumina style paired end, whole-genome sequencing with tumour normal pairs only. If you want more generic use please see the ASCAT R library.
Hi,
Thanks for the speedy reply, however the BAF values are only for the tumor sample. I am interested in the BAF values for the normal sample. Do you know how I can obtain them? Thanks!
Warmest regards, Hannan
You would need to go back to working with the underlying R library as far as I am aware. We only really support the wrapper and counting code.
However I think you are looking for the ascat/SnpGcCorrections.tsv
file which is part of the CNV_SV_ref_GRCh38_hla_decoy_ebv_brass6+.tar.gz
bundle.
wget ftp://ftp.sanger.ac.uk/pub/cancer/dockstore/human/GRCh38_hla_decoy_ebv/CNV_SV_ref_GRCh38_hla_decoy_ebv_brass6+.tar.gz
Yup, I'm interested in the counting code cause to get the BAF value for a particular probe I need to know which allele is the reference and which is the variant. So, for example with the output of the alleleCounter of my normal BAM file below
#CHR | POS | Count_A | Count_C | Count_G | Count_T | Good_depth -- | -- | -- | -- | -- | -- | -- chr1 | 95440 | 0 | 0 | 0 | 0 | 0 chr1 | 104186 | 0 | 1 | 0 | 9 | 10 chr1 | 122872 | 0 | 0 | 2 | 5 | 7 chr1 | 125271 | 0 | 0 | 0 | 0 | 0 chr1 | 135982 | 0 | 0 | 0 | 0 | 0I need to know for chr1 positions 104186 , and 122872 which is the reference and which is the variant allele to calculate the BAF for the corresponding probe. I am looking for a file similar to that from qcGenotype_GRCh38_hla_decoy_ebv/verifyBamID_snps.vcf.gz
which looks like this.
0 chr1 629241 rs10458597 C T . PASS AF=0.01572 1 chr1 629393 rs9629043 C T . PASS AF=0.05000 2 chr1 632373 rs11510103 A G . PASS AF=0.05000 3 chr1 785910 rs12565286 G C . PASS AF=0.05573 4 chr1 805477 rs12082473 G A . PASS AF=0.07838
The ascat/SnpGcCorrections.tsv
file only contains the probe name, chromosome and position I need the allele information. I guess what I am asking for is the file that is used to calculate the BAF values in your wrapper code before passing it on to the base ASCAT code in R. Hope this clarifies things, thanks!
Warmest regards, Hannan
The ASCAT R function is never informed of the reference base, the genome.fa is only used in a few places unrelated to ASCATs interpretation. This hack will pull the data into a file but further processing would be required:
samtools faidx genome.fa -r <(tail -n +2 ascat/SnpGcCorrections.tsv | perl -ane 'printf qq{%s:%d-%d\n}, $F[1],$F[2],$F[2]') > loci.txt
Gives:
>chr1:13116-13116
T
>chr1:15274-15274
A
...
As indicated above, please contact the authors of the R library if you require further information.
Ah okay, that looks like something I can work with. Thanks for the help!
Hi,
I am trying to obtain the BAF values from my normal BAM file after running it through the
alleleCounter.pl
. Which reference file should I use to calculate the BAF values, I tried using this file provided in the GRCh38 reference file bundleqcGenotype_GRCh38_hla_decoy_ebv/verifyBamID_snps.vcf.gz
, but there are quite a few probes that are missing in that file that are present in the output from alleleCounter. Could you point me to the appropriate reference file to calculate the BAF values for all the probes present in my normal BAM file. Thanks!Warmest regards. Hannan