epi2me-labs / wf-human-variation

Other
96 stars 42 forks source link

The detected CNV in the results are few. #113

Closed sloth-eat-pudding closed 10 months ago

sloth-eat-pudding commented 10 months ago

Ask away!

Dear Team, I have recently started investigating CNV (Copy Number Variation) issues using nanopore hg002 data in my research workflow, with the hg38 reference genome. During my analysis, it appears that CNVs are only present on the X and Y chromosomes. I am seeking your expert opinion to determine whether these results are normal or if they might indicate potential issues with my dataset or methodology.

Here are the details of my workflow and the results I have obtained:

N E X T F L O W  ~  version 23.10.0
Launching `https://github.com/epi2me-labs/wf-human-variation` [zen_panini] DSL2 - revision: 93a02af3ff [master]

||||||||||   _____ ____ ___ ____  __  __ _____      _       _
||||||||||  | ____|  _ \_ _|___ \|  \/  | ____|    | | __ _| |__  ___
|||||       |  _| | |_) | |  __) | |\/| |  _| _____| |/ _` | '_ \/ __|
|||||       | |___|  __/| | / __/| |  | | |__|_____| | (_| | |_) \__ \
||||||||||  |_____|_|  |___|_____|_|  |_|_____|    |_|\__,_|_.__/|___/
||||||||||  wf-human-variation v1.8.3-g93a02af
--------------------------------------------------------------------------------
Core Nextflow options
  revision       : master
  runName        : zen_panini
  containerEngine: docker
  container      : [withLabel:wf_human_snp:ontresearch/wf-human-variation-snp:sha0d7e7e8e8207d9d23fdf50a34ceb577da364373e, withLabel:wf_human_sv:ontresearch/wf-human-variation-sv:shabc3ac908a14705f248cdf49f218956ec33e93ef9, withLabel:wf_human_mod:ontresearch/wf-human-variation-methyl:shaa6e616571797d97ae2736c7ebdcb4613fe77f263, withLabel:wf_basecalling:nanoporetech/dorado:sha1433bfc3146fd0dc94ad9648452364f2327cf1b0, withLabel:wf_cnv:ontresearch/wf-cnv:sha428cb19e51370020ccf29ec2af4eead44c6a17c2, withLabel:wf_human_str:ontresearch/wf-human-variation-str:sha28799bc3058fa256c01c1f07c87f04e4ade1fcc1, withLabel:snpeff_annotation:ontresearch/snpeff:sha4f289afaf754c7a3e0b9ffb6c0b5be0f89a5cf04, withLabel:wf_common:ontresearch/wf-common:sha0a6dc21fac17291f4acb2e0f67bcdec7bf63e6b7, default:ontresearch/wf-human-variation:sha0800eade05e4cbb75d45421633c78c4f6320b2f6]
  launchDir      : /sloth/nextflow
  workDir        : /sloth/nextflow/work
  projectDir     : /sloth/.nextflow/assets/epi2me-labs/wf-human-variation
  userName       : sloth
  profile        : standard
  configFiles    : /sloth/.nextflow/assets/epi2me-labs/wf-human-variation/nextflow.config, /sloth/nextflow/nextflow.config

Workflow Options
  cnv            : true

Main options
  sample_name    : hg002
  bam            : /sloth/hg002.sup.60x.bam
  ref            : /sloth/reference/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa
  out_dir        : output1109-na-60

!! Only displaying parameters that differ from the pipeline defaults !!
--------------------------------------------------------------------------------
If you use epi2me-labs/wf-human-variation for your analysis please cite:

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

--------------------------------------------------------------------------------
This is epi2me-labs/wf-human-variation v1.8.3-g93a02af.
--------------------------------------------------------------------------------
WARN: CNV calling subworkflow does not support CRAM. You don't need to do anything, but we're just letting you know that:
WARN: - If your input file is CRAM, it will be converted to a temporary BAM inside the workflow automaticalldy.
WARN: - If your input requires alignment or basecalling, the outputs will be saved to your output directory as BAM instead of CRAM.
executor >  local (27)
[47/af7fc7] process > bam_ingress:check_for_alignment (1) [100%] 1 of 1 ✔
[-        ] process > bam_ingress:cram_to_bam             -
[-        ] process > bam_ingress:minimap2_alignment      -
[0e/482c12] process > getGenome (1)                       [100%] 1 of 1 ✔
[f6/93f2f3] process > cram_cache (1)                      [100%] 1 of 1 ✔
[e4/48d328] process > getAllChromosomesBed (1)            [100%] 1 of 1 ✔
[6d/2cd21a] process > mosdepth_input (1)                  [100%] 1 of 1 ✔
[d6/227350] process > readStats (1)                       [100%] 1 of 1 ✔
[bb/e92292] process > getVersions                         [100%] 1 of 1 ✔
[8d/f669c1] process > getParams                           [100%] 1 of 1 ✔
[51/1beca0] process > get_coverage (1)                    [100%] 1 of 1 ✔
[7c/ead193] process > makeAlignmentReport                 [100%] 1 of 1 ✔
[-        ] process > failedQCReport                      -
[5a/1f20d8] process > cnv:callCNV (1)                     [100%] 1 of 1 ✔
[66/381505] process > cnv:getVersions                     [100%] 1 of 1 ✔
[76/9bdbff] process > cnv:getParams                       [100%] 1 of 1 ✔
[97/4bdb87] process > cnv:makeReport (1)                  [100%] 1 of 1 ✔
[d9/8cbea8] process > output_cnv (1)                      [100%] 1 of 1 ✔
[3c/542df5] process > configure_jbrowse (1)               [100%] 1 of 1 ✔
[78/898077] process > publish_artifact (11)               [100%] 11 of 11 ✔
Completed at: 09-Nov-2023 05:28:19
Duration    : 1h 56m 48s
CPU hours   : 5.6
Succeeded   : 27

output1109-na-60/qdna_seq/hg002_segs.vcf

X       3000001 .       <DIP>   <DEL>   1000    PASS    SVTYPE=DEL;END=156000000;SVLEN=153000000;BINS=268;SCORE=-1;LOG2CNT=-1.01              GT      0/1
Y       2500001 .       <DIP>   <DEL>   1000    PASS    SVTYPE=DEL;END=27000000;SVLEN=24500000;BINS=26;SCORE=-1;LOG2CNT=-1.09   GT            0/1

output1109-na-60/qdna_seq/hg002_calls.vcf

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  hg002.sup.60x
X       3000001 .       <DIP>   <DEL>   1000    PASS    SVTYPE=DEL;END=156000000;SVLEN=153000000;BINS=268;SCORE=-1;LOG2CNT=-1.01              GT      0/1
Y       2500001 .       <DIP>   <DEL>   1000    PASS    SVTYPE=DEL;END=27000000;SVLEN=24500000;BINS=26;SCORE=-1;LOG2CNT=-1.09   GT            0/1

Thank you very much for your support and guidance.

vlshesketh commented 10 months ago

Hi @sloth-eat-pudding, the workflow is simply reporting the genetic sex of the sample, and has detected one copy of the X and Y chromosome. We have followed best practice recommended by the QDNAseq authors for assessing CNVs on the sex chromosomes (https://bioconductor.org/packages/release/bioc/vignettes/QDNAseq/inst/doc/QDNAseq.pdf). The results you have seen indicate that there are no CNVs detected in the sample, and don't indicate any issue with your methodology.

sloth-eat-pudding commented 10 months ago

I recently came across an interesting blog post that indirectly suggests the presence of numerous copy number variations (CNVs) on HG002. This information has led me to have some doubts about the results generated by the wf-human-variation pipeline.

The blog post I'm referring to is "Seeking Truth: Solving CNV Evaluation Challenges with T2T Genome Assembly - Broad Clinical Labs."

I am keen to understand your perspective on this matter. Do you have any insights or thoughts about these findings and their implications for the results produced by the wf-human-variation pipeline?

Your expertise on this subject would be greatly appreciated. Thank you for taking the time to consider my query.

TBradley27 commented 6 months ago

Hi @sloth-eat-pudding,

QDNASeq does not consider genomic bins in the blacklist. i.e. regions of the genome with considerable germline variation or otherwise odd mapping statistics or behaviour (e.g. telomeres and centromeres). - https://genome.cshlp.org/content/24/12/2022.long

I think this is the most likely reason you are not seeing the expected variation here.