WGLab / NanoCaller

Variant calling tool for long-read sequencing data
MIT License
90 stars 8 forks source link

multisample SNP calling? #21

Closed colindaven closed 2 years ago

colindaven commented 2 years ago

Hi,

I'm looking for a nanopore variant caller which can manage multisample SNP calling. To my knowledge, Longshot and Clair3 cannot do this (yet). Does your new tool do this or is this feature planned ?

Thanks, Colin

umahsn commented 2 years ago

Hi Colin,

Thank you for this suggestion. Currently, NanoCaller only does single sample variant calling. We have started working on expanding variant calling for multiple samples in NanoCaller and hopefully we will have something within the next month or so.

colindaven commented 2 years ago

Ok, thanks for this, it's good to hear.

tuannguyen8390 commented 1 year ago

A bit late to the party, but I'm also interested in this. In our current approach, we attempt to call single sample with Clair3 (with GVCF output). Then do a merge & force recall with GATK. I wonder if the same approach can be deployed with NanoCaller.

umahsn commented 1 year ago

Hi, yes you can try to do a similar approach with NanoCaller by running NanoCaller on individial samples and merging them with an external tool such as GATK. We have not tested this strategy ourselves but in theory it should work. You can specify sample names in NanoCaller VCF files by using --sample parameter so that VCF files can be merged.

tuannguyen8390 commented 1 year ago

Hi there, thanks for the response,

If that is the case, it is perhaps beneficial for NanoCaller to output gvcf instead of normal vcf, so that all sites can be recalled properly I think.

That would be a neat enhancement in the future,

Many thanks,

Tuan

colindaven commented 1 year ago

Does gvcf ever work practically though ? My efforts several years ago failed miserably with many undocumented bugs, others here have had the same experience completely independently. That's why I prefer multisample snp calling with a tool such as freebayes for short reads.

tuannguyen8390 commented 1 year ago

Hi there,

Thanks for the opinion. I'm currently using gvcf output from calling ONT reads with clair3, commit merging and perform recalling with gatk. Works fine without any bugs for the 1st run with around 40 individuals.

With my limited knowledge I'm not sure if gvcf improves chance of detecting SNP, but I'm aware that the GATK haplotype calling algo claim that by having variant call record at all sites boosted up the confidence. In our pipeline we also deploy Sniffles2 to detect structural variant, and it support joint calling also, so I believe there are merits in the method 😃 . Just my 2 cents.

Tuan