epi2me-labs / wf-human-variation

Other
86 stars 41 forks source link

Calling SV/SNP when using a bam file generated with a remora_cfg #152

Closed fidibidi closed 3 months ago

fidibidi commented 3 months ago

Ask away!

Hello!

I've been generating my bam files using the following command: (I'm aware that this is now deprecated as of today?? and should use wf-basecalling)

First I create the bams, which is why I include (--cnv because that saves the files as .bam)

        BASECALL_MODEL="dna_r10.4.1_e8.2_400bps_hac@v4.2.0"
        MOD_MODEL="dorado-models/dna_r10.4.1_e8.2_400bps_hac@v4.2.0_5mCG_5hmCG@v2/"

    ./nextflow run epi2me-labs/wf-human-variation \
        -w ${OUTPUT}/workspace \
        -profile standard \
        --sample_name ${SAMPLE} \
        --mod --cnv \
        --dorado_ext pod5 \
        --fast5_dir ${POD5_DIR}/ \
        --basecaller_cfg ${BASECALL_MODEL}  \
        --remora_cfg 'custom' \
        --remora_model_path ${MOD_MODEL} \
        --ref references/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna \
        --bam_min_coverage 0 \
        --threads 16 \
        --out_dir ${OUTPUT} \
        -resume
}

then I run a similar command:

        BASECALL_MODEL="dna_r10.4.1_e8.2_400bps_hac@v4.2.0"

    ./nextflow run epi2me-labs/wf-human-variation \
        -w ${OUTPUT}/workspace \
        -profile standard \
        --sample_name ${SAMPLE} \
        --snp --sv \
        --basecaller_cfg ${BASECALL_MODEL}  \
        --ref references/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna \
        --bam_min_coverage 0 \
        --threads 16 \
        --out_dir ${OUTPUT} \
        -resume

I do this in two parts to save cost when running a GPU cloud instance for basecalling and CPU for everything else My questions are:

  1. Is it correct to pass both a basecaller_cfg and remora_model_path to the basecaller?
  2. When calling --snp and --sv, since I used a remora_model, how can I confirm that I'm using the best model possible for clair3?
  3. in the initial nextflow call... i don't actually need --mod to generate the modified base calls in the bam... that's just to run modkit on the bam file later, right?

Thank you!

Fidi

RenzoTale88 commented 3 months ago

Hi @fidibidi:

  1. Basecalling is deprecated in wf-human-variation, doesn't have all the models and the improvements. But yes: if you want to perform modification calling you need to provide the appropriate remora model, see here for more details
  2. the correct Clair3 model is automatically determined based on the --basecaller_cfg flag
  3. Correct. The --mod simply run modkit and perform modified bases accumulation

I'd strongly encourage to use wf-basecalling for the basecalling of your data.