brentp / somalier

fast sample-swap and relatedness checks on BAMs/CRAMs/VCFs/GVCFs... "like damn that is one smart wine guy"
MIT License
254 stars 35 forks source link

Issue with reading mutect2 output on somalier (fileformat=VCFv4.2) #71

Closed brain-discourse closed 3 years ago

brain-discourse commented 3 years ago

I am having an issue with reading the mutect2 output (both, paired and tumor only) on somalier that follows VCFv4.2

I am using the following syntax: ./somalier extract -d extracted --sites /proj/humandb/sites.hg38.vcf -f /proj/ref/GRCh38.d1.vd1.fa /proj/test/4472_paired_filtered.vcf

and the output looks as follows:

somalier version: 0.2.12
[somalier] found 0 sites

My vcf files have all the relevant headers but I can't seem to extract sites for any of my files

chr1    17398654    .   GCC G,GC,GCCC   .   germline;multiallelic;normal_artifact;panel_of_normals  AS_FilterStatus=weak_evidence|SITE|weak_evidence;AS_SB_TABLE=0,20|0,3|0,22|0,3;DP=60;ECNT=1;GERMQ=1;MBQ=40,40,40,40;MFRL=382,357,350,373;MMQ=60,60,60,60;MPOS=46,50,57;NALOD=0.676,-3.010e+01,0.836;NLOD=1.89,-3.138e+01,6.42;PON;POPAF=4.51,4.51,4.51;ROQ=93;RPA=11,9,10,12;RU=C;STR;STRQ=93;TLOD=3.12,16.16,4.01  GT:AD:AF:DP:F1R2:F2R1:SB    0/0:8,1,14,0:0.040,0.572,0.030:23:0,0,2,0:7,1,10,0:0,8,0,15 0/1/2/3:12,2,8,3:0.097,0.312,0.122:25:0,2,0,0:12,0,8,3:0,12,0,13
chr1    18841984    .   GA  G,GAA,GAAA  .   germline;multiallelic;normal_artifact;panel_of_normals;slippage AS_FilterStatus=weak_evidence|base_qual|base_qual;AS_SB_TABLE=1,70|0,12|1,55|1,23;DP=220;ECNT=1;GERMQ=1;MBQ=13,28,13,13;MFRL=355,381,336,370;MMQ=60,60,60,60;MPOS=43,40,26;NALOD=-2.318e+00,-4.064e+01,-6.058e+00;NLOD=13.80,-5.352e+01,-1.757e+01;PON;POPAF=4.51,4.51,4.51;ROQ=93;RPA=15,14,16,17;RU=A;STR;STRQ=1;TLOD=7.64,34.95,12.63    GT:AD:AF:DP:F1R2:F2R1:SB    0/0:41,4,28,11:0.048,0.340,0.127:84:3,1,3,0:5,0,4,3:0,41,1,42   0/1/2/3:30,8,28,13:0.099,0.353,0.163:79:2,3,4,2:3,3,5,3:1,29,1,48

Any idea what might be causing this?

Thanks in advance!

brain-discourse commented 3 years ago

I also tried using the vcf file from the GATK Haplotype caller that has the following format but am encountering the same issue:

CHROM   POS    ID     REF    ALT    QUAL   FILTER INFO   FORMAT fcde191bl
chr1   1      .      N      <NON_REF>     .      .      END=12972     GT:DP:GQ:MIN_DP:PL       0/0:0:0:0:0,0,0
chr1   12973  .      A      <NON_REF>     .      .      END=13001     GT:DP:GQ:MIN_DP:PL       0/0:1:3:1:0,3,35
chr1   13002  .      G      <NON_REF>     .      .      END=13014     GT:DP:GQ:MIN_DP:PL       0/0:2:6:2:0,6,66
chr1   13015  .      G      <NON_REF>     .      .      END=13043     GT:DP:GQ:MIN_DP:PL       0/0:3:9:3:0,9,98
chr1   13044  .      A      <NON_REF>     .      .      END=13057     GT:DP:GQ:MIN_DP:PL       0/0:4:12:4:0,12,133
chr1   13058  .      C      <NON_REF>     .      .      END=13083     GT:DP:GQ:MIN_DP:PL       0/0:5:15:5:0,15,148
chr1   13084  .      G      <NON_REF>     .      .      END=13085     GT:DP:GQ:MIN_DP:PL       0/0:6:18:6:0,18,192
chr1   13086  .      C      <NON_REF>     .      .      END=13088     GT:DP:GQ:MIN_DP:PL       0/0:7:21:7:0,21,210
brentp commented 3 years ago

hi, somalier will work best when you extract from the bam files. extracting from a VCF with somatic calls is unlikely to work because few of those sites (none in your case) would be common variants. if you are unable to extract from bam then we can debug the gvcf issue.

brain-discourse commented 3 years ago

Thank you for your response. I tried extracting from the bam file but its generating an empty file with no error messages.

./somalier extract -d extracted/ --sites /proj/humandb/sites.hg38.vcf -f /proj/ref/GRCh38.d1.vd1.fa /proj/bams/3828.bwamem.sorted.marked.bam

this is there message I get when I run the code:

somalier version: 0.2.12

somalier version: 0.2.5 didn't work either.

Thank you in advance!

brentp commented 3 years ago

are you using these bams with other tools? most likely scenario I can think of is that something is wrong with them. can you re-index? are they hg38? did you change the sites file? it does not come unzipped, maybe try re-downloading from https://github.com/brentp/somalier/files/3412456/sites.hg38.vcf.gz

brain-discourse commented 3 years ago

Thank you so much! There was an issue with the sites file. This one worked perfectly with the bam.