epi2me-labs / wf-human-variation

Other
86 stars 41 forks source link

For non-model reference genome? #146

Closed CWYuan08 closed 1 month ago

CWYuan08 commented 4 months ago

Ask away!

hi, I am running this analysis again for other non-model species:

nextflow run epi2me-labs/wf-human-variation --bam final_merged.bam --basecaller_cfg 'clair3:dna_r10.4.1_e8.2_260bps_sup@v4.0.0' --ref ref.dna.toplevel.fasta --sample_name barcode01 --snp --sv -profile singularity --phased, I am getting this issue again:

Caused by: Process getGenome (1) terminated with an error exit status (65)

Command executed:

samtools idxstats final_merged.bam > final_merged.bam_genome.txt get_genome.py --chr_counts final_merged.bam_genome.txt -o output.txt genome_build=cat output.txt

what should I change for the current version?

Thank you very much for your help!!

vlshesketh commented 4 months ago

Hi @CWYuan08, for non-human genomes, you will need to set --annotation false, otherwise the workflow will try to determine which human genome build is being used, which leads to the getGenome error you've seen. Switching off annotation will bypass this stage, allowing you to analyse non-human genomes with the workflow.

CWYuan08 commented 4 months ago

Dear @vlshesketh,

thank you very much for prompt reply! I added --annotation false (nextflow run epi2me-labs/wf-human-variation --bam final_merged.bam --ref ref.dna.toplevel.fasta --sample_name barcode01 --snp --sv -profile singularity --annotation false), but I'm still getting this error:

ERROR ~ Error executing process > 'getGenome (1)'

Caused by: Process getGenome (1) terminated with an error exit status (65)

Command executed:

samtools idxstats final_merged.bam > final_merged.bam_genome.txt get_genome.py --chr_counts final_merged.bam_genome.txt -o output.txt genome_build=cat output.txt

Command exit status: 65

Command output: (empty)

Command error: The genome build detected in the BAM is not compatible with this workflow.

I'm not sure why...

Thank you very much for your help!

Best, CW

vlshesketh commented 4 months ago

Hi @CWYuan08 thank you for the update - I'll see if I can replicate this and get back to you.

vlshesketh commented 4 months ago

Hi @CWYuan08 can you please confirm which version of the workflow you are running? I have just run the most recent version using the snp demo data, with annotation set to false, and this has bypassed the getGenome process as expected, so I'm unable to replicate the the issue as you've described - if you can double check the workflow version that would be really helpful.

CWYuan08 commented 4 months ago

Dear @vlshesketh, I checked nextflow info epi2me-labs/w-human-variation I got: project name: epi2me-labs/wf-human-variation repository : https://github.com/epi2me-labs/wf-human-variation local path : /mnt/user/.nextflow/assets/epi2me-labs/wf-human-variation main script : main.nf description : Basecalling, SNV calling, SV calling, methylation calling of human samples. author : Oxford Nanopore Technologies revisions :

should I switch to v1.11.0?

Thank you very much for your help!

CWYuan08 commented 4 months ago

Dear @vlshesketh,

thank you for your help I have managed to run the pipeline. As this is not a model organism, it has different number of chromosomes compared to human. I found out the variant vcf file stopped at chromosome 22, is this because the default is human?

Best, CW

vlshesketh commented 3 months ago

Hi @CWYuan08, I'm glad you're now able to run the pipeline! For non-human genomes, you will need to use the parameter --include_all_ctgs, to ensure all contigs in your reference genome are used for variant calling. Please let us know if you are able to run the workflow successfully with this parameter.

Aravind-mss commented 3 months ago

Hi @vlshesketh , I am unable to progress any further with a similar issue. But in my case it's with the Human genome. I tried a couple of genome builds (hg38 & T2Tv2) and it aborts at the same step of getGenome(). I tried with both mapped and unmapped BAM as input, but no luck. Any pointers to get around this issue are much appreciated. Thanks.

Update: Even with --annotation false, I have the same issue. Please see my command below: nextflow run epi2me-labs/wf-human-variation --basecaller_cfg 'dna_r10.4.1_e8.2_400bps_hac@v4.2.0' --mod --ref $mydir/reference/GRCh38.p14_genomic.fasta --sample_name 'PGXXXF7' --threads 36 --snp --str --phased --annotation false --include_all_ctgs --bam $mydir/PGXXXF7-hg38mapped.sort.bam --out_dir $mydir/PGXXXF7_output_hg38mappedBAM -profile singularity

vlshesketh commented 2 months ago

Hi @Aravind-mss, apologies for the delay in responding. Please can you try with the recommended human genome build - as mentioned in the workflow documentation, we would advise the use of this reference as per this blog post. You shouldn't need to re-align your BAM as the workflow will perform this for you.

vlshesketh commented 1 month ago

Hi @Aravind-mss I'm closing this issue now as there have been no further replies, but please open a new issue if you require further support.