aeeckhou / shallowHRD

This method uses shallow Whole Genome Sequencing (sWGS) and the segmentation of a genomic profile to assess the Homologous Recombination Deficiency of a tumor based on the number of Large-scale Genomic Alterations (LGAs).
30 stars 13 forks source link

Error in finding the 1st threshold #22

Closed DolapoA closed 1 month ago

DolapoA commented 1 month ago

Hello Alexandre,

I think that the issue is related to the ratio.txt result from QDNAseq having a many "Inf" entries.

I think the error bears some resemblance to a closed issue, namely: https://github.com/aeeckhou/shallowHRD/issues/15

Snippet of ratio.txt file "chromosome" "start" "ratio" "ratio_median"

1   850001  1023.984    Inf
1   900001  Inf Inf
1   950001  Inf Inf
1   1000001 Inf Inf
1   1050001 Inf Inf
1   1100001 Inf Inf
1   1150001 Inf Inf
1   1200001 Inf Inf
1   1250001 Inf Inf
1   1300001 Inf Inf
1   1450001 Inf Inf
1   1500001 Inf Inf
1   1700001 0.003   Inf
1   1750001 0.003   Inf

Commands I have used: Rscript QDNAseq_from_bam_no_chrX.R sample1.sort.filtered HRD/sample1/shallowHRD 50

Rscript shallowHRD_hg19_1.13_QDNAseq_no_chrX.R sample1.sort.filtered.bam_ratio.txt HRD/sample1/shallowHRD/ cytoband_adapted_hg19.csv

The log and error observed:

Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which.max, which.min

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: impute
Loading required package: CGHbase
Loading required package: marray
Loading required package: limma

Attaching package: ‘limma’

The following object is masked from ‘package:BiocGenerics’:

    plotMA

Loading required package: snowfall
Loading required package: snow

Attaching package: ‘snow’

The following objects are masked from ‘package:BiocGenerics’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, clusterSplit, parApply, parCapply,
    parLapply, parRapply, parSapply

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, clusterSplit, makeCluster, parApply,
    parCapply, parLapply, parRapply, parSapply, splitIndices,
    stopCluster

Attaching package: ‘CGHcall’

The following object is masked from ‘package:BiocGenerics’:

    normalize

Warning message:
In normalizePath(path_to_bam_file) :
  path[1]="Alignments/sample1.sort.filtered": No such file or directory
[1] "Alignments"
[1] "sample1.sort.filtered"
[1] "HRD/sample1/shallowHRD"
[1] "50"
Loaded bin annotations for genome ‘hg19’, bin size 50 kbp, and experiment type ‘SR50’ from annotation package QDNAseq.hg19 v1.20.0
[1] "HRD/sample1/shallowHRD/sample1.sort.filtered.bam"
    sample1.sort.filtered (1 of 1): extracting reads ...[W::bam_hdr_read] EOF marker is absent. The input is probably truncated
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
 binning ...
61,927  total bins
57,633  of which in selected chromosomes
53,911  of which with reference sequence
49,857  final bins
Calculating correction for GC content and mappability
    Calculating fit for sample sample1.sort.filtered (1 of 1) ...
Done.
Applying median normalization ...
These samples cannot be normalized (median=0):
sample1.sort.filtered
Smoothing outliers ...
Performing segmentation:
NA
  chromosome   start Xsample1.sort.filtered Y[, 5]
1          1  850001                         1023.984    Inf
2          1  900001                              Inf    Inf
3          1  950001                              Inf    Inf
4          1 1000001                              Inf    Inf
5          1 1050001                              Inf    Inf
6          1 1100001                              Inf    Inf
[1] TRUE
[1] TRUE
======================================================== 
Sample :  sample1.sort.filtered 
======================================================== 
Fast gathering segments 1... 
Find Threshold 1... 
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'diff': 'x' contains missing values
Calls: localMaxima -> diff -> density -> density -> density.default
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'diff': 'x' contains missing values
Calls: localMaxima -> diff -> density -> density -> density.default
Warning message:
In mean.default(list_THR) :
  argument is not numeric or logical: returning NA
Error in FUN(X[[i]], ...) : object 'Threshold' not found
Calls: ggsave ... ggplot_build.ggplot -> by_layer -> f -> <Anonymous> -> f -> lapply -> FUN
Reading and initialisation 1... 
Error: object 'Threshold' not found
Error: object 'THR' not found
Error: object 'THR' not found
Gather big segments 1... 
Reput small segments 1... 
Error in getSegmentID(THR = THR, tmp = tmp, c_chr = c_chr + 1, c_cn = c_cn,  : 
  object 'THR' not found
Calls: breakSmoothToLGA -> getSegmentID
Call Large Genomic Alterations 1... 
Find new Threshold 2... 
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'diff': 'x' contains missing values
Calls: localMaxima -> diff -> density -> density -> density.default
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'diff': 'x' contains missing values
Calls: localMaxima -> diff -> density -> density -> density.default
Warning message:
In mean.default(list_THR) :
  argument is not numeric or logical: returning NA
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'diff': 'x' contains missing values
Calls: localMaxima -> diff -> density -> density -> density.default
Error in density.default(test) : 'x' contains missing values
Calls: density -> density -> density.default
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'diff': 'x' contains missing values
Calls: localMinima -> diff -> density -> density -> density.default
Error in density.default(test) : 'x' contains missing values
Calls: density -> density -> density.default
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'diff': 'x' contains missing values
Calls: localMaxima -> diff -> density -> density -> density.default
Error in density.default(test) : 'x' contains missing values
Calls: density -> density -> density.default
Error: object 'Second_local_maxima' not found
Error in FUN(X[[i]], ...) : object 'Threshold' not found
Calls: ggsave ... ggplot_build.ggplot -> by_layer -> f -> <Anonymous> -> f -> lapply -> FUN
Find fast gathering 2... 
Reading and initialisation 2... 
Error: object 'Threshold' not found
Error: object 'THR' not found
Error: object 'THR' not found
Gather big segments 2... 
Reput small segments 2... 
Error in getSegmentID(THR = THR, tmp = tmp, c_chr = c_chr + 1, c_cn = c_cn,  : 
  object 'THR' not found
Calls: breakSmoothToLGA -> getSegmentID
Call Large Genomic Alterations 2... 
Creating graphes... 
Quality assessment... 
Amplifications & deletions... 
Error: object 'THR' not found

... x176 lines in total of -> Error: object 'THR' not found

Error: object 'CN_baseline_initial_point_DOCK7' not found
Error: object 'CN_baseline_initial_segment_DOCK7' not found
Error: object 'CN_baseline_final_DOCK7' not found
Error in data.frame(gene, chr, start, ratio_point_initial, CN_to_baseline_point_initial,  : 
  object 'CN_to_baseline_point_initial' not found
Error in is.data.frame(x) : 
  object 'amplification_deletion_table' not found
Calls: write.table -> is.data.frame
Error in fortify(data) : object 'amplification_deletion_table' not found
Calls: geom_point -> layer -> fortify
Creating summary_plot... 

Is there anything you could suggest for me to resolve this? For example would you recommend a different bin size during QDNAseq?

aeeckhou commented 1 month ago

Hello something went wrong in the generation of the ratio.txt aq you suggested with the "Inf" values. In the beginning of the output from the first command I see these two messages, on indicating that the pathway to the BAM file is not recognize properly and the second that the BAM has an EOF marker absent and that the bam is probably truncated.

Warning message:
In normalizePath(path_to_bam_file) :
  path[1]="Alignments/sample1.sort.filtered": No such file or directory
"HRD/sample1/shallowHRD/sample1.sort.filtered.bam"
    sample1.sort.filtered (1 of 1): extracting reads ...[W::bam_hdr_read] EOF marker is absent. The input is probably truncated
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 byte

Something is wrong with your BAM file and probably how you indicated the filenames and how you generate your BAM. Align on hg19, only on the chromosomes 1-22, name your BAM file with an extension ".bam" at the end, you don't have it on the "sample1.sort.filtered" as it should and this will create an potential error.

Rscript QDNAseq_from_bam_no_chrX.R sample1.sort.filtered HRD/sample1/shallowHRD 50

Good luck keep me posted.

Alexandre

DolapoA commented 1 month ago

Hi Alexandre, Thanks, I will take a further look. I did get some results in my other samples with the same format but I will look at whether I can generate them successfully if I try again and make sure all paths are correct. Dolapo.

DolapoA commented 1 month ago

Thanks Alexandre, I've resolved the issue. I needed to generate my BAMs again! I appreciate your quick response. Looking forward to shallowHRD_v2!

All the best. Dolapo.

aeeckhou commented 1 month ago

Hello Dolapo,

Good to know !!

Good luck on your research! Alexandre Eeckhoutte