bioinformatics-centre / BayesTyper

A method for variant graph genotyping based on exact alignment of k-mers
86 stars 7 forks source link

Questions on data bundle and SV genotyping #46

Open Han-Cao opened 1 year ago

Han-Cao commented 1 year ago

Hi @jonassibbesen ,

Thanks for providing this great tool.

I am now trying to download the data bundle from the link you provide. However, it always failed after ~1GB data is downloaded no matter which tool I use. For example, wget keep raising error Connection closed at byte 1073725440. Retrying..

Will you consider provide an alternative link for download? Or could you clarify if I generate the reference data for GRCh38 in this way is OK:

  1. Reference genome: put chr1-22, X, Y, chrrandom to canon.fa, put chrUn, chrdecoy to decoy.fa, skip chralt and HLA
  2. Variant prior vcf: sequence resolved site-only vcf

By the way, if I only want to genotype large SVs detected from long read sequencing-based callset, can I skip the variant calling step and estimate SV genotype of short read sequencing samples using SV callset + SNV/INDEL prior file?

Thanks, Han

jonassibbesen commented 1 year ago

Hi Han,

Thanks for writing. I just tried to download the GRCh38 data bundle and was able to without a problem. Could you maybe try again now? If you still have problems I have also now put the GRCh38 bundle on google drive: https://drive.google.com/file/d/1ioTjLFkfmvOMsXubJS5_rwpfajPv5G1Q/view?usp=sharing

Regarding the SVs call from long reads and the prior. It is not something I have tried myself. I have seen from other studies that BayesTyper generally does better on SVs predicted using short reads compared to long reads. This is likely due to the breakpoints being more accurate from short reads which is important for BayesTyper.

Best,

Jonas