Closed Psy-Fer closed 1 week ago
New dorado-server config file structure has changed since 7.3.9 onwards
It now looks more like this
dna_r10.4.1_e8.2_400bps_modbases_5hmc_5mc_cg_hac.cfg
# Basic configuration file for ONT basecaller software.
# Basecalling.
dorado_model_path = dna_r10.4.1_e8.2_400bps_hac@v4.1.0
dorado_modbase_models = dna_r10.4.1_e8.2_400bps_hac@v4.1.0_5mCG_5hmCG@v2
# Calibration strand detection
calib_reference = lambda_3.6kb.fasta
calib_min_sequence_length = 3000
calib_max_sequence_length = 3800
calib_min_coverage = 0.6
# Output.
min_qscore = 9.0
dna_r10.4.1_e8.2_400bps_5khz_hac.cfg
# Basic configuration file for ONT basecaller software.
# Compatibility
compatible_flowcells = FLO-MIN114,FLO-FLG114,FLO-PRO114,FLO-PRO114M
compatible_kits = SQK-LSK114,SQK-LSK114-XL,SQK-ULK114,SQK-RAD114,SQK-PCS114
compatible_kits_with_barcoding = SQK-NBD114-24,SQK-NBD114-96,SQK-RBK114-24,SQK-RBK114-96,SQK-RPB114-24,SQK-MLK114-96-XL,SQK-16S114-24,SQK-PCB114-24
# Basecalling.
dorado_model_path = dna_r10.4.1_e8.2_400bps_hac@v4.3.0
# Calibration strand detection
calib_reference = lambda_3.6kb.fasta
calib_min_sequence_length = 3000
calib_max_sequence_length = 3800
calib_min_coverage = 0.6
# Output.
min_qscore = 9.0
Looks like the way they expose the mode to the the API now includes the model version correctly, so no need to read the config file
for an example in a fastq file output
@74c57b9f-6ec7-4cc6-8384-c0df0d5e7f82 parent_read_id=74c57b9f-6ec7-4cc6-8384-c0df0d5e7f82 model_version_id=dna_r10.4.1_e8.2_400bps_fast@v4.3.0 mean_qscore=13
So now I need to do this for sam....RG tags?
I can get modbase onese using this tag in the basecaller output
modbase_model_version_id
Modbase model seems to only be exposed at the read level.
I'm going to add this to the TODO pile because that will require getting the first read, and triggering the header writes before writing the first read, rather than when the writer is spawned.
new header looks like this where the DS tag has basecall_model
and the model version dna_r10.4.1_e8.2_400bps_fast@v4.3.0
@HD VN:1.5 SO:unknown @PG ID:basecaller PN:ont basecaller VN:7.4.12 @PG ID:wrapper PN:buttery-eel VN:0.4.3 CL:buttery-eel --guppy_bin /home/jamfer/Downloads/ont-dorado-server-7.4.12/bin/ --config dna_r10.4.1_e8.2_400bps_5khz_fast.cfg -x cuda:0 -i small.blow5 -o test-7413.sam --port auto --use_tcp DS:ont basecaller wrapper basecall_model=dna_r10.4.1_e8.2_400bps_fast@v4.3.0
When
--config dna_r10.4.1_e8.2_400bps_5khz_modbases_5mc_sup_prom.cfg
is given as the model, this doesn't say what the model version is for matching with clair3 variant calling models.To fix this, I should pull the data out of the cfg file used and put it into the fastq or the PG tags in the sam
in the cfg file it is under
dorado_model_path
Probably need to do the modbase one too just in case
Only mod config files have
remora_models
anddorado_modbase_models