nextflow run wf-human-variation --bam /in.bam --sample_name name --out_dir /out --ref /data/ref/GRCh38_no_alt_analysis_set_dup_masked.fa --snp --sv --cnv --str --mod --annotation true --phased --tr_bed /human_GRCh38_no_alt_analysis_set.trf.bed --GVCF -profile standard -w /work/ -c increase_memory.config --sex female
Workflow Execution - CLI Execution Profile
standard (default)
What happened?
wf-human-variation use another genome build for alignment then specified with the --ref option.
I'm interested in methylation and CNV analysis of the H19 locus. However, using the wf-human-variation workflow produced alignment files, do not allow analysis of this region. Why? Because the alignment step use a hg38/GRCh38 genome build were false duplicated regions are not masked. (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-023-02863-7)
The H19 locus is false duplicated on the chr11_KI270721v1_random contig, which leads to many reads in this region with mapping quality 0, if not masked.
The wf-human-variation result show exactly this behavior.
However, mapping the same reads with minimap2 (minimap2 -ax lr:hq --MD /GRCh38_no_alt_analysis_set_dup_masked.fa in.fastq.gz) against the with --ref specified genome (which is a false duplication masked GRCh38 version), I got this result.
So why wf-human-variation did not use the specified ref genome fasta file? And why a version of hg38/GRCh38 without masked false duplications is used.
Thanks in advance for clarifying the issue.
Best,
Florian
Relevant log output
n/a
Application activity log entry
No response
Were you able to successfully run the latest version of the workflow with the demo data?
OK, I found the issue. wf-human-variation does not re-align the reads, if a aligned BAM is used as input. So it my fault. Sorry.
the issues could be closed.
Operating System
Ubuntu 22.04
Other Linux
No response
Workflow Version
2.1.0
Workflow Execution
Command line (Local)
Other workflow execution
No response
EPI2ME Version
No response
CLI command run
nextflow run wf-human-variation --bam /in.bam --sample_name name --out_dir /out --ref /data/ref/GRCh38_no_alt_analysis_set_dup_masked.fa --snp --sv --cnv --str --mod --annotation true --phased --tr_bed /human_GRCh38_no_alt_analysis_set.trf.bed --GVCF -profile standard -w /work/ -c increase_memory.config --sex female
Workflow Execution - CLI Execution Profile
standard (default)
What happened?
wf-human-variation use another genome build for alignment then specified with the --ref option. I'm interested in methylation and CNV analysis of the H19 locus. However, using the wf-human-variation workflow produced alignment files, do not allow analysis of this region. Why? Because the alignment step use a hg38/GRCh38 genome build were false duplicated regions are not masked. (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-023-02863-7) The H19 locus is false duplicated on the chr11_KI270721v1_random contig, which leads to many reads in this region with mapping quality 0, if not masked.
The wf-human-variation result show exactly this behavior.
However, mapping the same reads with minimap2 (minimap2 -ax lr:hq --MD /GRCh38_no_alt_analysis_set_dup_masked.fa in.fastq.gz) against the with --ref specified genome (which is a false duplication masked GRCh38 version), I got this result.
![grafik](https://github.com/epi2me-labs/wf-human-variation/assets/36499388/b01269c6-1197-4946-b0bf-a1fb2d14f16a)
So why wf-human-variation did not use the specified ref genome fasta file? And why a version of hg38/GRCh38 without masked false duplications is used. Thanks in advance for clarifying the issue.
Best, Florian
Relevant log output
Application activity log entry
No response
Were you able to successfully run the latest version of the workflow with the demo data?
yes
Other demo data information
No response