brentp / somalier

fast sample-swap and relatedness checks on BAMs/CRAMs/VCFs/GVCFs... "like damn that is one smart wine guy"
MIT License
262 stars 35 forks source link

relatedness = -nan and no HETs found #5

Closed lconde-ucl closed 5 years ago

lconde-ucl commented 5 years ago

Hi, I have some WGS, RNAseq and targeted RNAseq normal (no tumour) samples and I wanted to run somalier on them to find out if any of them are from the same individual (i.e., if for example I have WGS and RNAseq data from the same sample).

I had no problems running somalier, but the results are very confusing because somalier does not find any HETs in the data and the relatedness metric that I get is '-nan' for all the pairs of samples. This is the first few lines of the resulting somaliers.pairs.tsv file:

_#sample_a sample_b relatedness hom_concordance hets_a hets_b shared_hets hom_alts_a hom_alts_b shared_homalts ibs0 ibs2 n ERR2322353 ERR2322354 -nan 0.827 0 0 0 4208 3456 2858 0 2858 2858 ERR2322353 ERR2322355 -nan 0.795 0 0 0 4208 3811 3030 0 3030 3030 ERR2322353 ERR2322356 -nan 0.847 0 0 0 4208 4875 3563 0 3563 3563 ERR2322353 ERR2322357 -nan 0.811 0 0 0 4208 3555 2884 0 2884 2884 ERR2322353 ERR2322358 -nan 0.805 0 0 0 4208 3710 2987 0 2987 2987

I had tried to match these samples previously just by calling variants on all of them and checking the genotypes. While doing this, I was able to call both HOMs and REFs in all the samples (I'm using the HaplotypeCaller from GATK). So I was wondering if you could please let me know what the problem might be? I'm using the sites.vcf file provided by somalier.

Many thanks

brentp commented 5 years ago

could you share chromosome 1 of 2 of the bams?

lconde-ucl commented 5 years ago

Sure, thanks, should I email them to you?

brentp commented 5 years ago

emailing a link to dropbox or somehwere would be great.

lconde-ucl commented 5 years ago

Brilliant. I've uploaded chr1 of two BAMs, as well as the sites.vcf file that I used (it's the same one you provide but I added 'chr' to the chromosome names to match my BAMs and reference fasta)

https://www.dropbox.com/sh/1u7ev9190ib9bxu/AAB8AhZcRrMzoJOzVV9zbS7Ta?dl=0

brentp commented 5 years ago

looks like you lower-cased the REF and ALT columns too. Can you try with those upper-cased and let me know if the problem persists? Meanwhile, I am looking into why only the het stuff is empty for your example. Thanks very much for providing the data.

lconde-ucl commented 5 years ago

Hi, yes I had to lowercase the alleles too because my reference fasta is soft-masked, and somalier would throw a "mistmatch" error on these bases even if they are identical (i.e., somalier would crash with error: "reference base from sites file:G does not match that from reference: g"). So I lower-cased all the sites as well as the fasta. I will try to upper-case everything and see if that makes any difference. Many thanks!

brentp commented 5 years ago

ok. I'll update the check to upper-case the allele pulled from fasta. Meanwhile, you might have to wait for next release or upper-case your entire fasta file.

brentp commented 5 years ago

Here is a new binary you can try. It will still require the VCF to be upper-cased, but it will not error if the fasta is lower-cased. somalier.gz

Please let me know if this works for you.

lconde-ucl commented 5 years ago

Thanks I will try it

lconde-ucl commented 5 years ago

Thanks, the new version is running without problems using the original sites.VCF (with 'chr's) and the original soft-masked FASTA. I will let you know if, when it finishes, I still find the issue with the lack of HETs and -nan relatedness.

lconde-ucl commented 5 years ago

Just to update you, the new version of somalier now worked well, at least the result files are now outputting HETs and relatedness metrics. The samples that should appear as related are not shown as related though, but that's what I got too with the other methods I tried, so it might be that indeed my samples were were all swapped :/

screenshot 2019-01-30 at 16 39 29

brentp commented 5 years ago

thanks for updating. can you show this plot for IBS0 vs IBS2? And/or share the html via email? I should probably set the minimum relatedness value to -1 so there is better resolution as anything < 0 is unrelated.

lconde-ucl commented 5 years ago

Sure, I'll send the html by email, attaching here the plot

screenshot 2019-01-30 at 16 50 04

brentp commented 5 years ago

I am closing this as resolved. please re-open if you have further problems.