HKU-BAL / Clair3

Clair3 - Symphonizing pileup and full-alignment for high-performance long-read variant calling
246 stars 27 forks source link

The same input bam, fasta and models, but different output size of merge_output.vcf.gz #303

Closed Jerry-is-a-mouse closed 5 months ago

Jerry-is-a-mouse commented 7 months ago

Hi, when I use clair3(v1.0.5) to call variants in HG002's PacBio HiFi 15-20kb chemistry2 reads, I typed the run_clair3.sh command twice in the command line, using the same inputs, but the results of merge_output.vcf.gz were of different sizes. Is it right or in other words, this result is caused by the principle and algorithm clair3 used?

aquaskyline commented 7 months ago

Could you please look into the two VCFs and see what are the differences.

Jerry-is-a-mouse commented 7 months ago

@aquaskyline I count how many variants were called using wc command as follows: (1) The one vcf.gz I got yesterday: less merge_output.vcf.gz | grep -v "^#" | wc -l 4443956 (2) The one vcf.gz I got about 2 months ago: less HG002_Nanopore.vcf.gz | grep -v "^#" | wc -l 4527382 I am so sorry that the files are too big to upload.

aquaskyline commented 6 months ago

One of your files named Nanopore, but you said you were using the same PacBio HiFi input for both runs?

Jerry-is-a-mouse commented 6 months ago

Sorry,what I used is Nanopore sequencing. Because I had re-run the both type of data, so I found out that the pacbio hifi result is the same but nanopore are different.

aquaskyline commented 6 months ago

Outputs of Clair3 are deterministic. You might want to try again using the same version, model, and parameters.