Closed asalhab closed 4 months ago
Hi @asalhab, thank you for reporting this. As you suggest, it is likely that the discrepancy in chromosome naming is the reason you didn't get any STR results, but I'll confirm this and let you know when we have issued a fix.
Thank you @vlshesketh
any update @vlshesketh on this issue?
Hi @asalhab apologies for the delay with this - a fix will be released within the next couple of weeks.
Hi @asalhab, I'm sorry that it has taken a while to respond to this.
We wanted to address the problem you reported, but due to variations in human genome versions, supporting all possible genomes and builds is challenging for us. Having done some testing with the Ensembl genome and the --str
subworkflow, the results are not always consistent due to some supplementary alignments skewing the called STRs and generating false positives. As a result, we will instead be making a recommendation in the documentation for wf-human-variation regarding genome selection when working with human data, following the advice set out in this blog post: https://lh3.github.io/2017/11/13/which-human-reference-genome-to-use. As you have already noticed, the repeats BED file is based on a genome with chr
prefixes, and to preserve the integrity of the analysis, we feel it's safer not to modify this file.
Operating System
Other Linux (please specify below)
Other Linux
No response
Workflow Version
v1.8.3
Workflow Execution
Command line
EPI2ME Version
No response
CLI command run
nextflow run epi2me-labs/wf-human-variation -r v1.8.3 -w /gpfs/scratch/ONT/0201382701 -profile singularity -c wf-human-variation-config.cfg --snp --sv --cnv --mod --str --mapula --phase_vcf --phase_mod --GVCF --joint_phasing --bam 0201382701.merged.bam --ref Homo_sapiens.GRCh38.dna.toplevel.110.fa --basecaller_cfg dna_r10.4.1_e8.2_400bps_hac@v4.2.0 --sample_name 0201382701.merged --sex male --out_dir /data/0201382701/2D_PAS66250_9ff9ec6a.0201382701/hg38/wfhv.1.8.3 --threads 8 --ubam_map_threads 16 --merge_threads 8 --ubam_bam2fq_threads 8
Workflow Execution - CLI Execution Profile
singularity
What happened?
The run finished succesfully. All expected results have been generated except the short tandem repeats results. My guess is that because I used enemble GRCh38 genome (which has no "chr" prefix), while the files ariant_catalog_hg38.json and wf_str_repeats.bed have "chr" in chromosme names. In a different run where I used a fasta file that I downloaded from UCSC (chromosmes have "chr" prefix), the str results were generated. Is there a way to provide these files as arguments? or modify the pipeline to deal with "chr" prefix?
Thanks, Abdulrahman
Relevant log output
Application activity log entry
No response