HKU-BAL / Clair3

Clair3 - Symphonizing pileup and full-alignment for high-performance long-read variant calling
230 stars 27 forks source link

Clair3 GVCF created file is 1bp of the end sequence so validatevariants fails #317

Open klausyboi opened 4 weeks ago

klausyboi commented 4 weeks ago

Hi,

Im trying to combine my gvcfs, first using gatks validatevariants but i get the error:

A GVCF must cover the entire region. Found 16 loci with no VariantContext covering it. The first uncovered segment is:Pf3D7_01_v3:640851

Looking at a sample that worked from illumina the bp is always 1 short for each chromosome and 640851 is the end of the chromosome in this case so i think thats the problem.

I used clair3 to get the gvcf using: barcode=\$(basename "\$barcode_dir") run_clair3.sh \ --bamfn=barcodes/\$barcode/calls"\$barcode"_sorted.bam \ --ref_fn=${ref_seq} \ --threads=${threads} \ --platform="ont" \ --model_path=/mnt/storageG1/data/software/rerio/clair3_models/r1041e82${bps}_${type}_v500 \ --output=vcf/\$barcode/ \ --include_all_ctgs \ --gvcf

I then used sortvcf from gatk because the variants were all in the wrong order and I was getting this error, hoping it would fix it but alas it did not.

Any help with this is greatly appreciated.

Thanks

aquaskyline commented 3 weeks ago

Could you please send over the reads and reference sequence of Pf3D7_01_v3. We will try the repeat the problem on our side.

klausyboi commented 3 weeks ago

ref genome my bam my fastq if you wanted to start from there which was created from the bam anyway but otherwise id be sending over lots of pod5s

zhengzhenxian commented 3 days ago

@klausyboi Thanks for sharing the data; we will fix the gvcf range issue in the next release.