HKU-BAL / Clair3

Clair3 - Symphonizing pileup and full-alignment for high-performance long-read variant calling
247 stars 27 forks source link

Pileup Models for every dorado basecalling model #343

Open Rohit-Satyam opened 4 weeks ago

Rohit-Satyam commented 4 weeks ago

Dear Developers

In the readme section, you mention that ONT models can be used with Clair3. I downloaded one of these model dna_r10.4.1_e8.2_400bps_fast@v4.3.0 and gave it as input to Clair3. However, I get the following error

ONT-provided Models
ONT provides models for some latest or specific chemistries and basecallers (including both Guppy and Dorado) through [Rerio](https://github.com/nanoporetech/rerio). These models are tested and supported by the ONT developers.
 No pileup model found in provided model path and model prefix /home/satyamr/dorado-0.8.2-linux-x64/models/dna_r10.4.1_e8.2_400bps_fast@v4.3.0/pileup 

Should I use r1041_e82_400bps_sup_v420.tar.gz instead even the basecaller version is slightly different and the basecalling was run in fast mode ? Is there a way to convert the dorado models to pileup?

## Viral ONT samples
run_clair3.sh --bam_fn=p41530_barcode59_denv2.bam --ref_fn=reference.fasta \
--threads=20 --model_path=/home/satyamr/dorado-0.8.2-linux-x64/models/dna_r10.4.1_e8.2_400bps_fast@v4.3.0 \
--output=p41530_barcode59_clair3 --platform="ont" --haploid_precise --include_all_ctgs --no_phasing_for_fa --enable_long_indel
Rohit-Satyam commented 4 weeks ago

Also, I think you should also add SISPA to the README file since it also gives extremely high coverage and therefore people should use --var_pct_full --ref_pct_full 1 right?

aquaskyline commented 4 weeks ago

The command looks correct. Could you please ls /home/satyamr/dorado-0.8.2-linux-x64/models/dna_r10.4.1_e8.2_400bps_fast@v4.3.0 and see what it shows?

Rohit-Satyam commented 1 week ago

@aquaskyline

Apologies as I was away on medical leave. Here are the contents of the directory as you requested

ls /home/satyamr/dorado-0.8.2-linux-x64/models/dna_r10.4.1_e8.2_400bps_fast@v4.3.0
0.conv.bias.tensor    1.conv.weight.tensor  4.rnn.bias_hh_l0.tensor    4.rnn.weight_ih_l0.tensor  5.rnn.weight_hh_l0.tensor  6.rnn.bias_ih_l0.tensor    7.rnn.bias_hh_l0.tensor    7.rnn.weight_ih_l0.tensor  8.rnn.weight_hh_l0.tensor  config.toml
0.conv.weight.tensor  2.conv.bias.tensor    4.rnn.bias_ih_l0.tensor    5.rnn.bias_hh_l0.tensor    5.rnn.weight_ih_l0.tensor  6.rnn.weight_hh_l0.tensor  7.rnn.bias_ih_l0.tensor    8.rnn.bias_hh_l0.tensor    8.rnn.weight_ih_l0.tensor
1.conv.bias.tensor    2.conv.weight.tensor  4.rnn.weight_hh_l0.tensor  5.rnn.bias_ih_l0.tensor    6.rnn.bias_hh_l0.tensor    6.rnn.weight_ih_l0.tensor  7.rnn.weight_hh_l0.tensor  8.rnn.bias_ih_l0.tensor    9.linear.weight.tensor
aquaskyline commented 1 week ago

The folder contains a dorado model, not a Clair3 model.

Rohit-Satyam commented 1 week ago

Yeah. That was the question. Is there a way to have clair3 models in "hac" and "fast" mode too, or should we use the "sup" mode model on the fasta files produced in "fast" base calling mode?