caravagnalab / CNAqc

CNAqc - Copy Number Alteration (CNA) Quality Check package
GNU General Public License v3.0
17 stars 8 forks source link

error running Sequenza_CNAqc #23

Closed bosmont closed 1 year ago

bosmont commented 1 year ago

Hi, I am having an error running Sequenza_CNAqc using seqz file generated by sequenza. The 'run_1' was finished successfully. But the following error occurred at "run_2":

ℹ Quality control with CNAqc: run_2

── CNAqc - CNA Quality Check ─────────────────────────────────────────────────────────────────────────────────────────────────
ℹ Quality control with CNAqc: run_2
ℹ Using reference genome coordinates for: GRCh38.
! Detected indels mutation; do not forget to rely more on SNVs for data QC.
✔ Fortified calls for 812 somatic mutations: 636 SNVs (78%) and 176 indels.
! CNAs have no CCF, assuming clonal CNAs (CCF = 1).
! Added segments length (in basepairs) to CNA segments.
✔ Fortified CNAs for 112 segments: 112 clonal and 0 subclonal.
✔ 811 mutations mapped to clonal CNAs.

── Peak analysis: simple CNAs ────────────────────────────────────────────────────────────────────────────────────────────────
ℹ Quality control with CNAqc: run_2
! No karyotypes satisfy input data filters.
Joining with `by = join_by(Major, minor)`
Joining with `by = join_by(karyotype)`

── Peak analysis: complex CNAs ───────────────────────────────────────────────────────────────────────────────────────────────
ℹ Quality control with CNAqc: run_2
ℹ Karyotypes 3:2 and 4:3 with >100 mutation(s). Using epsilon = 0.05.
# A tibble: 2 × 5 with CNAqc: run_2
# Groups:   karyotype, matched [2]
  karyotype n           mismatched matched  prop
  <chr>     <table[1d]>      <int>   <int> <dbl>
1 3:2       198                  2       1 0.333
2 4:3       122                  3       1 0.25 
Adding missing grouping variables: `matched`
Joining with `by = join_by(Major, minor, QC_PASS)`
Adding missing grouping variables: `matched`
Joining with `by = join_by(karyotype, QC_PASS)`

── Peak analysis: subclonal CNAs ─────────────────────────────────────────────────────────────────────────────────────────────
ℹ Quality control with CNAqc: run_2
ℹ No subclonal CNAs in this sample. 
── [ CNAqc ]  811 mutations in 112 segments (112 clonal, 0 subclonal). Genome reference: GRCh38. ─────────────────────────────

── Clonal CNAs                      
ℹ Quality control with CNAqc: run_2
   3:2  [n = 198, L = 600 Mb] ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
   4:3  [n = 122, L = 253 Mb] ■■■■■■■■■■■■■■■■■■■■■■■■■■
   5:0  [n =  67, L = 238 Mb] ■■■■■■■■■■■■■■
   2:2  [n =  64, L = 253 Mb] ■■■■■■■■■■■■■■
   4:0  [n =  60, L = 346 Mb] ■■■■■■■■■■■■■
   5:2  [n =  51, L = 162 Mb] ■■■■■■■■■■■
   3:3  [n =  38, L = 147 Mb] ■■■■■■■■
   6:2  [n =  30, L = 114 Mb] ■■■■■■
   4:4  [n =  26, L =  65 Mb] ■■■■■■
   3:0  [n =  20, L = 155 Mb] ■■■■

ℹ Sample Purity: 78.3333333333333% ~ Ploidy: 5.
ℹ Quality control with CNAqc: run_2
Error in UseMethod("group_by") : 
  no applicable method for 'group_by' applied to an object of class "NULL"
In addition: Warning messages:
1: In dir.create(out_dir) : 'run_1' already exists
2: replacing previous import ‘cli::num_ansi_colors’ by ‘crayon::num_ansi_colors’ when loading ‘BMix’ 
3: replacing previous import ‘crayon::%+%’ by ‘ggplot2::%+%’ when loading ‘BMix’ 
4: replacing previous import ‘cli::num_ansi_colors’ by ‘crayon::num_ansi_colors’ when loading ‘easypar’ 
5: In dir.create(out_dir) : 'run_2' already exists
✖ Quality control with CNAqc: run_2 ... failed

Please advise how to fix the problem. Thanks so much.

bosmont commented 1 year ago

Here is a different error when I run another sample (again the 'run_1' was successful, but 'run_2' failed):

ℹ Quality control with CNAqc: run_2

── CNAqc - CNA Quality Check ────────────────────────────────────────────────────────────────────────────────────────────────────────────────
ℹ Quality control with CNAqc: run_2
ℹ Using reference genome coordinates for: GRCh38.
! Detected indels mutation; do not forget to rely more on SNVs for data QC.
✔ Fortified calls for 505 somatic mutations: 367 SNVs (73%) and 138 indels.
! CNAs have no CCF, assuming clonal CNAs (CCF = 1).
! Added segments length (in basepairs) to CNA segments.
✔ Fortified CNAs for 60 segments: 60 clonal and 0 subclonal.
✔ 504 mutations mapped to clonal CNAs.

── Peak analysis: simple CNAs ───────────────────────────────────────────────────────────────────────────────────────────────────────────────
ℹ Quality control with CNAqc: run_2
! No karyotypes satisfy input data filters.
Joining with `by = join_by(Major, minor)`
Joining with `by = join_by(karyotype)`

── Peak analysis: complex CNAs ──────────────────────────────────────────────────────────────────────────────────────────────────────────────
ℹ Quality control with CNAqc: run_2
ℹ No karyotypes with >100 mutation(s). 

── Peak analysis: subclonal CNAs ────────────────────────────────────────────────────────────────────────────────────────────────────────────
ℹ Quality control with CNAqc: run_2
ℹ No subclonal CNAs in this sample. 
── [ CNAqc ]  504 mutations in 60 segments (60 clonal, 0 subclonal). Genome reference: GRCh38. ──────────────────────────────────────────────

── Clonal CNAs                      
ℹ Quality control with CNAqc: run_2
 3:2  [n = 98, L = 351 Mb] ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
 2:1  [n = 77, L = 658 Mb] ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
 4:1  [n = 76, L = 362 Mb] ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
 2:2  [n = 65, L = 316 Mb] ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
 3:1  [n = 62, L = 243 Mb] ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
 3:0  [n = 57, L = 460 Mb] ■■■■■■■■■■■■■■■■■■■■■■■■■■■
 1:1  [n = 19, L = 165 Mb] ■■■■■■■■■
 5:0  [n =  9, L = 18 Mb] ■■■■
 6:1  [n =  9, L = 24 Mb] ■■■■
 5:2  [n =  8, L = 55 Mb] ■■■■

ℹ Sample Purity: 28.5858585858586% ~ Ploidy: 3.
ℹ Quality control with CNAqc: run_2
✔ Quality control with CNAqc: run_2 ... done
→ New proposal
# A tibble: 0 × 2
# ℹ 2 variables: cellularity <dbl>, ploidy <dbl>
→ Cached evaluations
# A tibble: 2 × 5
  run   purity ploidy sequenza         cnaqc  
  <chr>  <dbl>  <dbl> <list>           <list> 
1 run_1  0.223   4.67 <named list [7]> <cnaqc>
2 run_2  0.286   3.63 <named list [7]> <cnaqc>
Error in `filter()`:
ℹ In argument: `QC == "PASS"`.
Caused by error:
! object 'QC' not found
Run `rlang::last_trace()` to see where the error occurred.
> rlang::last_trace()
<error/rlang_error>
Error in `filter()`:
ℹ In argument: `QC == "PASS"`.
Caused by error:
! object 'QC' not found
caravagn commented 1 year ago

Thanks, we will look into this with @nicola-calonaci. Can you confirm that all files are good and you managed to create good CNAs with standard Sequenza?

pbousquets commented 1 year ago

Hi, I'm facing the same issue on some WES samples. Depending on the file I'm working with, I find the problem on different runs.

This is the log of the failing run:

ℹ Quality control with CNAqc: run_3

── CNAqc - CNA Quality Check ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
ℹ Quality control with CNAqc: run_3
ℹ Using reference genome coordinates for: GRCh37.
✔ Fortified calls for 198 somatic mutations: 198 SNVs (100%) and 0 indels.
! CNAs have no CCF, assuming clonal CNAs (CCF = 1).
! Added segments length (in basepairs) to CNA segments.
✔ Fortified CNAs for 47 segments: 47 clonal and 0 subclonal.
✔ 197 mutations mapped to clonal CNAs.

── Peak analysis: simple CNAs ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
ℹ Quality control with CNAqc: run_3
! No karyotypes satisfy input data filters.
Joining with `by = join_by(Major, minor)`
Joining with `by = join_by(karyotype)`

── Peak analysis: complex CNAs ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
ℹ Quality control with CNAqc: run_3
ℹ No karyotypes with >100 mutation(s). 

── Peak analysis: subclonal CNAs ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
ℹ Quality control with CNAqc: run_3
ℹ No subclonal CNAs in this sample. 
── [ CNAqc ] MySample 197 mutations in 47 segments (47 clonal, 0 subclonal). Genome reference: GRCh37. ────────────────────────────────────────────────────────────────────────────────────────────────────

── Clonal CNAs                      
ℹ Quality control with CNAqc: run_3
 2:2  [n = 63, L = 807 Mb] ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
 2:0  [n = 51, L = 616 Mb] ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
 0:0  [n = 21, L = 509 Mb] ■■■■■■■■■■■■■■■■■■■■■■■
 4:0  [n = 20, L = 103 Mb] ■■■■■■■■■■■■■■■■■■■■■
 5:0  [n = 12, L = 172 Mb] ■■■■■■■■■■■■■
 1:0  [n = 11, L = 303 Mb] ■■■■■■■■■■■■
 6:0  [n =  7, L = 76 Mb] ■■■■■■■■
 1:1  [n =  6, L = 68 Mb] ■■■■■■
 4:2  [n =  6, L = 113 Mb] ■■■■■■

ℹ Sample Purity: 25.7070707070708% ~ Ploidy: 4.
ℹ Quality control with CNAqc: run_3
✔ Quality control with CNAqc: run_3 ... done
Error in `filter()`:
ℹ In argument: `QC == "PASS"`.
Caused by error:
! object 'QC' not found

Also, there are several warnings coming out (the same warning appears >50 times) from sequenza, although I think they're unrelated to the problem here:

43: In density.default(c(Bf, Af), weight = c(good.reads, good.reads)/(2 *  ... :
  Selecting bandwidth *not* using 'weights'

I've seen a quite recent commit (4168341) that probably lead to this issue, in case it could help

caravagn commented 1 year ago

@nicola-calonaci can you please take care of this?

nicola-calonaci commented 1 year ago

Hi all,

in all reported cases the QC of simple CNAs cannot be carried out because too few mutations are provided as input (100 mutations per karyotype is the default threshold). You can see it from the printed messages: ! No karyotypes satisfy input data filters. As reported in the pipeline documentation (https://caravagnalab.github.io/CNAqc/reference/Sequenza_CNAqc.html) in the Arguments section,

Optional parameters passed to the analyse_peaks function by CNAqc. Tune these to change error tolerance or karyotypes to use for QC.

you can adjust the min_absolute_karyotype_mutations parameter that is passed to the internal function analyse_peaks. In commit 8c07c76 I added a control to abort the pipeline if the QC cannot be carried out in any of the Sequenza solutions (different solutions may imply different numbers of mutations per karyotype).

caravagn commented 1 year ago

Cool, will close this now.