ShixiangWang / gcap

GCAP (Gene-level Circular Amplicon Prediction) firstly implements extrachromosomal DNA detection from whole-exome-sequencing (WES) data and absolute copy number profiles. https://shixiangwang.r-universe.dev/gcap
https://shixiangwang.github.io/gcap/
Other
15 stars 2 forks source link

Error on input bam files #43

Closed tingchiafelix closed 4 months ago

tingchiafelix commented 5 months ago

Hi,

I'm testing one of our samples. I keep receiving this BAM file input error. Could you please provide insight on this? Or anything I may miss for running the workflow?

Best, TC

Citation:
    Wang, S., Wu, CY., He, MM. et al. Machine learning-based extrachromosomal DNA identification in
    large-scale cohorts reveals its clinical implications in cancer. Nat Commun 15, 1515 (2024). https://doi.org/10.1038/s41467-024-45479-6
<gcap> 2024-05-02 14:46:46.705228 info [gcap.workflow]: =====================
<gcap> 2024-05-02 14:46:46.811139 info [gcap.workflow]:    GCAP WORKFLOW
<gcap> 2024-05-02 14:46:46.842758 info [gcap.workflow]: =====================
<gcap> 2024-05-02 14:46:46.848493 info [gcap.workflow]:
<gcap> 2024-05-02 14:46:46.85215 info [gcap.workflow]: =====================
<gcap> 2024-05-02 14:46:46.856077 info [gcap.workflow]: Step 1: Run ASCAT 3.0
<gcap> 2024-05-02 14:46:46.85973 info [gcap.workflow]: =====================
<gcap> 2024-05-02 14:46:46.902852 info [gcap.runASCAT]: > Run ASCAT on WES data <
<gcap> 2024-05-02 14:46:46.906931 info [gcap.runASCAT]:
<gcap> 2024-05-02 14:46:46.910696 info [gcap.runASCAT]: Configs:
<gcap> 2024-05-02 14:46:46.914346 info [gcap.runASCAT]:   result path set to /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/output
<gcap> 2024-05-02 14:46:46.918119 info [gcap.runASCAT]:   allelecounter_exe set to ~/miniconda3/envs/cancerit/bin/alleleCounter
<gcap> 2024-05-02 14:46:46.921977 info [gcap.runASCAT]:   g1000allelesprefix set to /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/ref/1000G_loci_hg19//1000genomesAlleles2012_chr
<gcap> 2024-05-02 14:46:46.925937 info [gcap.runASCAT]:   g1000lociprefix set to /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/ref/1000G_loci_hg19//1000genomesloci2012_chr
<gcap> 2024-05-02 14:46:46.929676 info [gcap.runASCAT]:   GCcontentfile set to /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/ref/GC_correction_hg19.txt
<gcap> 2024-05-02 14:46:46.933743 info [gcap.runASCAT]:   replictimingfile set to /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/ref/RT_correction_hg19.txt
<gcap> 2024-05-02 14:46:46.937872 info [gcap.runASCAT]:   nthreads set to 22
<gcap> 2024-05-02 14:46:46.941952 info [gcap.runASCAT]:   minCounts set to 10
<gcap> 2024-05-02 14:46:46.946189 info [gcap.runASCAT]:   BED_file set to NA
<gcap> 2024-05-02 14:46:46.950991 info [gcap.runASCAT]:   probloci_file set to NA
<gcap> 2024-05-02 14:46:46.954729 info [gcap.runASCAT]:   chrom_names set to <1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22>
<gcap> 2024-05-02 14:46:46.958628 info [gcap.runASCAT]:   gender set to <XX>
<gcap> 2024-05-02 14:46:46.96287 info [gcap.runASCAT]:   min_base_qual set to 20
<gcap> 2024-05-02 14:46:46.96679 info [gcap.runASCAT]:   min_map_qual set to 35
<gcap> 2024-05-02 14:46:47.001714 info [gcap.runASCAT]:   penalty set to 70
<gcap> 2024-05-02 14:46:47.005996 info [gcap.runASCAT]:   skip_finished_ASCAT set to TRUE
<gcap> 2024-05-02 14:46:47.01952 info [gcap.runASCAT]: 1 jobs detected
<gcap> 2024-05-02 14:46:47.023649 info [gcap.runASCAT]: No ASCAT job to skip.
<gcap> 2024-05-02 14:46:47.027272 info [FUN]: start submitting job 116655
<gcap> 2024-05-02 14:46:47.031696 info [FUN]:      tumor data file: /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/116655~072-R~AK7A15E12~WES.bwa.final.bam
<gcap> 2024-05-02 14:46:47.038118 info [FUN]:     normal data file: /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/116655_germline~WES.bwa.final.bam
<gcap> 2024-05-02 14:46:47.042614 info [FUN]:    tumor sample name: 116655~072-R~AK7A15E12~WES
<gcap> 2024-05-02 14:46:47.047361 info [FUN]:   normal sample name: 116655_germline~WES
<gcap> 2024-05-02 14:46:47.059514 fatal [value[[3L]]]: job 116655 failed in ASCAT due to following error
<gcap> 2024-05-02 14:46:47.065013 info [value[[3L]]]: unused arguments (g1000allelesprefix = g1000allelesprefix, g1000lociprefix = g1000lociprefix)
<gcap> 2024-05-02 14:46:47.070303 info [value[[3L]]]: =====
<gcap> 2024-05-02 14:46:47.07639 info [value[[3L]]]: Please check your input bam files (if missing bam index? if its alignment quality is lower?)
<gcap> 2024-05-02 14:46:47.087456 info [value[[3L]]]: =====
<gcap> 2024-05-02 14:46:47.094112 info [gcap.runASCAT]: ASCAT analysis done, check /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/output for results
<gcap> 2024-05-02 14:46:47.102426 info [gcap.workflow]: checking ASCAT result files
<gcap> 2024-05-02 14:46:47.107178 warn [FUN]: result file /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/output/116655.ASCAT.rds does not exist, the corresponding ASCAT calling has error occurred
<gcap> 2024-05-02 14:46:47.113135 warn [FUN]: /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/output/116655.ASCAT.rds contains a failed ASCAT job, will discard it before next step
<gcap> 2024-05-02 14:46:47.121662 fatal [gcap.workflow]: no sucessful ASCAT result file to proceed!
<gcap> 2024-05-02 14:46:47.126795 fatal [gcap.workflow]: check your ASCAT setting before make sure this case could not be used!
Error in gcap.workflow(tumourseqfile = "/mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/116655~072-R~AK7A15E12~WES.bwa.final.bam",  :
ShixiangWang commented 5 months ago

@tingchiafelix Please install the specific version of ASCAT and re-run

# This is a forked version ASCAT
remotes::install_github("ShixiangWang/ascat@v3-for-gcap-v1", subdir = "ASCAT")
# A ASCAT version with loose SAM flag, useful sometimes
# remotes::install_github("ShixiangWang/ascat@v3-f1", subdir = "ASCAT")
# See https://github.com/ShixiangWang/gcap/issues/27
ShixiangWang commented 5 months ago

Also it's not recommended to have symbol ~ in sample name, e.g. 116655~072-R~AK7A15E12~WES.bwa.final.bam.

tingchiafelix commented 5 months ago

Hi Shixiang,

Thank you for providing more details. However, I still got the error. Would you please take a look and let me know your suggestion?

[1] Reading Tumor LogR data... [1] Reading Tumor BAF data... [1] Reading Germline LogR data... [1] Reading Germline BAF data... [1] Registering SNP locations... [1] Splitting genome in distinct chunks...

2024-05-06 16:09:19.683663 fatal [value[[3L]]]: job 116655 failed in ASCAT due to following error 2024-05-06 16:09:19.697185 info [value[[3L]]]: length(ovl) > nrow(ASCATobj$Tumor_LogR)/10 is not TRUE 2024-05-06 16:09:19.700663 info [value[[3L]]]: ===== 2024-05-06 16:09:19.703945 info [value[[3L]]]: Please check your input bam files (if missing bam index? if its alignment quality is lower?) 2024-05-06 16:09:19.707342 info [value[[3L]]]: ===== 2024-05-06 16:09:19.710576 info [gcap.runASCAT]: ASCAT analysis done, check /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/output for results 2024-05-06 16:09:19.725071 info [gcap.workflow]: checking ASCAT result files 2024-05-06 16:09:19.730401 warn [FUN]: result file /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/output/116655.ASCAT.rds does not exist, the corresponding ASCAT calling has error occurred 2024-05-06 16:09:19.733579 warn [FUN]: /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/output/116655.ASCAT.rds contains a failed ASCAT job, will discard it before next step 2024-05-06 16:09:19.73753 fatal [gcap.workflow]: no sucessful ASCAT result file to proceed! 2024-05-06 16:09:19.74128 fatal [gcap.workflow]: check your ASCAT setting before make sure this case could not be used! Error in gcap.workflow(tumourseqfile = "/mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/116655-072-R-AK7A15E12-WES.bwa.final.bam", : In addition: Warning message: One or more parsing issues, call `problems()` on your data frame for details, e.g.: dat <- vroom(...) problems(dat) Execution halted Best, TC
ShixiangWang commented 5 months ago

Could you share your bam data in private? It seems a error in ASCAT package. Also could you provide all information about the log, not just the last part.

tingchiafelix commented 5 months ago

Hi,

Please see the log messages and bam/bai files I used (these files were downloaded/processed from public resources).

https://www.dropbox.com/scl/fo/9pf9mkereww28q2yda0md/AAK61fn5wJmihoPxa8AW3s4?rlkey=ix2pyr6nyijr9suk04wxpv3ne&st=nyvgvewx&dl=0

Loading required package: ASCAT Loading required package: RColorBrewer Loading required package: splines Loading required package: readr Loading required package: GenomicRanges Loading required package: stats4 Loading required package: BiocGenerics

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:stats’:

IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

anyDuplicated, aperm, append, as.data.frame, basename, cbind,
colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
table, tapply, union, unique, unsplit, which.max, which.min

Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:utils’:

findMatches

The following objects are masked from ‘package:base’:

expand.grid, I, unname

Loading required package: IRanges Loading required package: GenomeInfoDb Loading required package: parallel Loading required package: doParallel Loading required package: foreach Loading required package: iterators Loading required package: sigminer sigminer version 2.3.0

Citation: Wang, S., Wu, CY., He, MM. et al. Machine learning-based extrachromosomal DNA identification in large-scale cohorts reveals its clinical implications in cancer. Nat Commun 15, 1515 (2024). https://doi.org/10.1038/s41467-024-45479-6

2024-05-07 08:44:14.807557 info [gcap.workflow]: ===================== 2024-05-07 08:44:14.862886 info [gcap.workflow]: GCAP WORKFLOW 2024-05-07 08:44:14.873267 info [gcap.workflow]: ===================== 2024-05-07 08:44:14.876267 info [gcap.workflow]: 2024-05-07 08:44:14.879241 info [gcap.workflow]: ===================== 2024-05-07 08:44:14.882264 info [gcap.workflow]: Step 1: Run ASCAT 3.0 2024-05-07 08:44:14.88498 info [gcap.workflow]: ===================== 2024-05-07 08:44:14.906953 info [gcap.runASCAT]: > Run ASCAT on WES data < 2024-05-07 08:44:14.910186 info [gcap.runASCAT]: 2024-05-07 08:44:14.913236 info [gcap.runASCAT]: Configs: 2024-05-07 08:44:14.916008 info [gcap.runASCAT]: result path set to /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/output 2024-05-07 08:44:14.918991 info [gcap.runASCAT]: allelecounter_exe set to ~/miniconda3/envs/cancerit/bin/alleleCounter 2024-05-07 08:44:14.922247 info [gcap.runASCAT]: g1000allelesprefix set to /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/ref/1000G_loci_hg19//1000genomesAlleles2012_chr 2024-05-07 08:44:14.924978 info [gcap.runASCAT]: g1000lociprefix set to /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/ref/1000G_loci_hg19//1000genomesloci2012chrstring_chr 2024-05-07 08:44:14.928246 info [gcap.runASCAT]: GCcontentfile set to /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/ref/GC_correction_updated_hg19.txt 2024-05-07 08:44:14.930902 info [gcap.runASCAT]: replictimingfile set to /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/ref/RT_correction_updated_hg19.txt 2024-05-07 08:44:14.933583 info [gcap.runASCAT]: nthreads set to 22 2024-05-07 08:44:14.936279 info [gcap.runASCAT]: minCounts set to 10 2024-05-07 08:44:14.939393 info [gcap.runASCAT]: BED_file set to NA 2024-05-07 08:44:14.942085 info [gcap.runASCAT]: probloci_file set to NA 2024-05-07 08:44:14.944758 info [gcap.runASCAT]: chrom_names set to <1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22> 2024-05-07 08:44:14.947962 info [gcap.runASCAT]: gender set to 2024-05-07 08:44:14.95074 info [gcap.runASCAT]: min_base_qual set to 20 2024-05-07 08:44:14.953413 info [gcap.runASCAT]: min_map_qual set to 35 2024-05-07 08:44:14.956221 info [gcap.runASCAT]: penalty set to 70 2024-05-07 08:44:14.95909 info [gcap.runASCAT]: skip_finished_ASCAT set to FALSE 2024-05-07 08:44:14.973968 info [gcap.runASCAT]: 1 jobs detected 2024-05-07 08:44:14.976949 info [FUN]: start submitting job 116655 2024-05-07 08:44:14.979626 info [FUN]: tumor data file: /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/116655-072-R-AK7A15E12-WES.bwa.final.bam 2024-05-07 08:44:14.982321 info [FUN]: normal data file: /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/116655_germline-WES.bwa.final.bam 2024-05-07 08:44:14.986044 info [FUN]: tumor sample name: 116655-072-R-AK7A15E12-WES 2024-05-07 08:44:14.988823 info [FUN]: normal sample name: 116655_germline-WES Reading locis Reading locis Reading locis Reading locis Reading locis Reading locis Reading locis Reading locis Reading locis Reading locis Reading locis Reading locis Reading locis Reading locis Reading locis Reading locis Reading locis Reading locis Reading locis Reading locis Reading locis Reading locis Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Reading locis Reading locis Reading locis Reading locis Reading locis Reading locis Reading locis Reading locis Reading locis Reading locis Reading locis Reading locis Reading locis Reading locis Reading locis Reading locis Reading locis Reading locis Reading locis Reading locis Reading locis Reading locis Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: Done reading locis Multi pos start: [1] Reading Tumor LogR data... [1] Reading Tumor BAF data... [1] Reading Germline LogR data... [1] Reading Germline BAF data... [1] Registering SNP locations... [1] Splitting genome in distinct chunks... 2024-05-07 08:56:28.217455 fatal [value[[3L]]]: job 116655 failed in ASCAT due to following error 2024-05-07 08:56:28.232145 info [value[[3L]]]: length(ovl) > nrow(ASCATobj$Tumor_LogR)/10 is not TRUE 2024-05-07 08:56:28.235166 info [value[[3L]]]: ===== 2024-05-07 08:56:28.238075 info [value[[3L]]]: Please check your input bam files (if missing bam index? if its alignment quality is lower?) 2024-05-07 08:56:28.241363 info [value[[3L]]]: ===== 2024-05-07 08:56:28.244246 info [gcap.runASCAT]: ASCAT analysis done, check /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/output for results 2024-05-07 08:56:28.253933 info [gcap.workflow]: checking ASCAT result files 2024-05-07 08:56:28.257859 warn [FUN]: result file /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/output/116655.ASCAT.rds does not exist, the corresponding ASCAT calling has error occurred 2024-05-07 08:56:28.260933 warn [FUN]: /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/output/116655.ASCAT.rds contains a failed ASCAT job, will discard it before next step 2024-05-07 08:56:28.263658 fatal [gcap.workflow]: no sucessful ASCAT result file to proceed! 2024-05-07 08:56:28.266366 fatal [gcap.workflow]: check your ASCAT setting before make sure this case could not be used! Error in gcap.workflow(tumourseqfile = "/mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/116655-072-R-AK7A15E12-WES.bwa.final.bam", : In addition: Warning message: One or more parsing issues, call `problems()` on your data frame for details, e.g.: dat <- vroom(...) problems(dat) Execution halted
ShixiangWang commented 4 months ago

Thanks for your sharing (https://www.dropbox.com/scl/fo/9pf9mkereww28q2yda0md/AAK61fn5wJmihoPxa8AW3s4?rlkey=ix2pyr6nyijr9suk04wxpv3ne&e=1&st=nyvgvewx&dl=0). I will download and check tomorrow

tingchiafelix commented 4 months ago

Thank you Shixiang for working on this. Please let me know if you need further information from me.

Best, TC

ShixiangWang commented 4 months ago

@tingchiafelix Hi, I just got the result, I cannot reproduce the error. I used the same test environment to debug the issue https://github.com/ShixiangWang/gcap/issues/41 (workflow see https://github.com/ShixiangWang/gcap/tree/master/test-workflow/debug )

Please make sure your R>4.1, ASCAT version ShixiangWang/ascat@51fd695 (check with devtools::session_info()) and complete annotation data (https://github.com/ShixiangWang/gcap/blob/master/test-workflow/debug/2-prepare.sh)

Session info

─ Session info ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.2.2 (2022-10-31)
 os       CentOS Linux 7 (Core)
 system   x86_64, linux-gnu
 ui       RStudio
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       Asia/Shanghai
 date     2024-05-14
 rstudio  2022.12.0+353 Elsbeth Geranium (server)
 pandoc   NA

─ Packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 package          * version   date (UTC) lib source
 ASCAT            * 3.0.0     2023-03-31 [1] Github (ShixiangWang/ascat@51fd695)
 Biobase          * 2.58.0    2022-11-01 [2] Bioconductor
 BiocGenerics     * 0.44.0    2022-11-01 [2] Bioconductor
 BiocManager        1.30.19   2022-10-25 [2] CRAN (R 4.2.2)
 bit                4.0.5     2022-11-15 [2] CRAN (R 4.2.2)
 bit64              4.0.5     2020-08-30 [2] CRAN (R 4.2.2)
 bitops             1.0-7     2021-04-24 [2] CRAN (R 4.2.2)
 cachem             1.0.6     2021-08-19 [2] CRAN (R 4.2.2)
 Cairo              1.6-0     2022-07-05 [1] CRAN (R 4.2.2)
 callr              3.7.3     2022-11-02 [2] CRAN (R 4.2.2)
 cli                3.6.0     2023-01-09 [2] CRAN (R 4.2.2)
 cluster            2.1.4     2022-08-22 [2] CRAN (R 4.2.2)
 codetools          0.2-19    2023-02-01 [2] CRAN (R 4.2.2)
 colorspace         2.1-0     2023-01-23 [2] CRAN (R 4.2.2)
 crayon             1.5.2     2022-09-29 [2] CRAN (R 4.2.2)
 data.table         1.14.8    2023-02-17 [2] CRAN (R 4.2.2)
 devtools           2.4.5     2022-10-11 [2] CRAN (R 4.2.2)
 digest             0.6.31    2022-12-11 [2] CRAN (R 4.2.2)
 doParallel       * 1.0.17    2022-02-07 [1] CRAN (R 4.2.2)
 dplyr              1.1.0     2023-01-29 [2] CRAN (R 4.2.2)
 ellipsis           0.3.2     2021-04-29 [2] CRAN (R 4.2.2)
 fansi              1.0.4     2023-01-22 [2] CRAN (R 4.2.2)
 fastmap            1.1.0     2021-01-25 [2] CRAN (R 4.2.2)
 foreach          * 1.5.2     2022-02-02 [2] CRAN (R 4.2.2)
 fs                 1.6.1     2023-02-06 [2] CRAN (R 4.2.2)
 furrr              0.3.1     2022-08-15 [2] CRAN (R 4.2.2)
 future             1.31.0    2023-02-01 [2] CRAN (R 4.2.2)
 gcap             * 1.1.4     2024-04-22 [1] Github (ShixiangWang/gcap@1f1dbd2)
 generics           0.1.3     2022-07-05 [2] CRAN (R 4.2.2)
 GenomeInfoDb     * 1.34.9    2023-02-02 [2] Bioconductor
 GenomeInfoDbData   1.2.9     2023-02-24 [2] Bioconductor
 GenomicRanges    * 1.50.2    2022-12-16 [2] Bioconductor
 GetoptLong         1.0.5     2020-12-15 [1] CRAN (R 4.2.2)
 ggplot2            3.4.1     2023-02-10 [2] CRAN (R 4.2.2)
 GlobalOptions      0.1.2     2020-06-10 [1] CRAN (R 4.2.2)
 globals            0.16.2    2022-11-21 [2] CRAN (R 4.2.2)
 glue               1.6.2     2022-02-24 [2] CRAN (R 4.2.2)
 gridBase           0.4-7     2014-02-24 [1] CRAN (R 4.2.2)
 gtable             0.3.1     2022-09-01 [2] CRAN (R 4.2.2)
 hms                1.1.2     2022-08-19 [2] CRAN (R 4.2.2)
 htmltools          0.5.4     2022-12-07 [2] CRAN (R 4.2.2)
 htmlwidgets        1.6.1     2023-01-07 [2] CRAN (R 4.2.2)
 httpuv             1.6.9     2023-02-14 [2] CRAN (R 4.2.2)
 IRanges          * 2.32.0    2022-11-01 [2] Bioconductor
 iterators        * 1.0.14    2022-02-05 [2] CRAN (R 4.2.2)
 jsonlite           1.8.4     2022-12-06 [2] CRAN (R 4.2.2)
 later              1.3.0     2021-08-18 [2] CRAN (R 4.2.2)
 lattice            0.20-45   2021-09-22 [2] CRAN (R 4.2.2)
 lgr                0.4.4     2022-09-05 [1] CRAN (R 4.2.2)
 lifecycle          1.0.3     2022-10-07 [2] CRAN (R 4.2.2)
 listenv            0.9.0     2022-12-16 [2] CRAN (R 4.2.2)
 magrittr           2.0.3     2022-03-30 [2] CRAN (R 4.2.2)
 Matrix             1.6-5     2024-01-11 [1] CRAN (R 4.2.2)
 memoise            2.0.1     2021-11-26 [2] CRAN (R 4.2.2)
 mime               0.12      2021-09-28 [2] CRAN (R 4.2.2)
 miniUI             0.1.1.1   2018-05-18 [2] CRAN (R 4.2.2)
 munsell            0.5.0     2018-06-12 [2] CRAN (R 4.2.2)
 NMF                0.26      2023-03-20 [1] CRAN (R 4.2.2)
 parallelly         1.34.0    2023-01-13 [2] CRAN (R 4.2.2)
 pillar             1.8.1     2022-08-19 [2] CRAN (R 4.2.2)
 pkgbuild           1.4.0     2022-11-27 [2] CRAN (R 4.2.2)
 pkgconfig          2.0.3     2019-09-22 [2] CRAN (R 4.2.2)
 pkgload            1.3.2     2022-11-16 [2] CRAN (R 4.2.2)
 plyr               1.8.8     2022-11-11 [2] CRAN (R 4.2.2)
 prettyunits        1.1.1     2020-01-24 [2] CRAN (R 4.2.2)
 processx           3.8.0     2022-10-26 [2] CRAN (R 4.2.2)
 profvis            0.3.7     2020-11-02 [2] CRAN (R 4.2.2)
 promises           1.2.0.1   2021-02-11 [2] CRAN (R 4.2.2)
 ps                 1.7.2     2022-10-26 [2] CRAN (R 4.2.2)
 purrr              1.0.1     2023-01-10 [2] CRAN (R 4.2.2)
 quadprog           1.5-8     2019-11-20 [1] CRAN (R 4.2.2)
 R6                 2.5.1     2021-08-19 [2] CRAN (R 4.2.2)
 rappdirs           0.3.3     2021-01-31 [2] CRAN (R 4.2.2)
 RColorBrewer     * 1.1-3     2022-04-03 [2] CRAN (R 4.2.2)
 Rcpp               1.0.10    2023-01-22 [2] CRAN (R 4.2.2)
 RCurl              1.98-1.10 2023-01-27 [2] CRAN (R 4.2.2)
 readr            * 2.1.4     2023-02-10 [2] CRAN (R 4.2.2)
 registry           0.5-1     2019-03-05 [1] CRAN (R 4.2.2)
 remotes            2.4.2     2021-11-30 [2] CRAN (R 4.2.2)
 reshape2           1.4.4     2020-04-09 [2] CRAN (R 4.2.2)
 rjson              0.2.21    2022-01-09 [1] CRAN (R 4.2.2)
 rlang              1.0.6     2022-09-24 [2] CRAN (R 4.2.2)
 rngtools           1.5.2     2021-09-20 [1] CRAN (R 4.2.2)
 rstudioapi         0.14      2022-08-22 [2] CRAN (R 4.2.2)
 S4Vectors        * 0.36.1    2022-12-05 [2] Bioconductor
 scales             1.2.1     2022-08-20 [2] CRAN (R 4.2.2)
 sessioninfo        1.2.2     2021-12-06 [2] CRAN (R 4.2.2)
 shiny              1.7.4     2022-12-15 [2] CRAN (R 4.2.2)
 sigminer         * 2.1.9     2022-11-09 [1] CRAN (R 4.2.2)
 stringi            1.7.12    2023-01-11 [2] CRAN (R 4.2.2)
 stringr            1.5.0     2022-12-02 [2] CRAN (R 4.2.2)
 tibble             3.1.8     2022-07-22 [2] CRAN (R 4.2.2)
 tidyr              1.3.0     2023-01-24 [2] CRAN (R 4.2.2)
 tidyselect         1.2.0     2022-10-10 [2] CRAN (R 4.2.2)
 tzdb               0.3.0     2022-03-28 [2] CRAN (R 4.2.2)
 urlchecker         1.0.1     2021-11-30 [2] CRAN (R 4.2.2)
 usethis            2.1.6     2022-05-25 [2] CRAN (R 4.2.2)
 utf8               1.2.3     2023-01-31 [2] CRAN (R 4.2.2)
 uuid               1.1-0     2022-04-19 [2] CRAN (R 4.2.2)
 vctrs              0.5.2     2023-01-23 [2] CRAN (R 4.2.2)
 vroom              1.6.1     2023-01-22 [2] CRAN (R 4.2.2)
 withr              2.5.0     2022-03-03 [2] CRAN (R 4.2.2)
 xgboost            1.5.2.1   2022-02-21 [1] CRAN (R 4.2.2)
 xtable             1.8-4     2019-04-21 [2] CRAN (R 4.2.2)
 XVector            0.38.0    2022-11-01 [2] Bioconductor
 zlibbioc           1.44.0    2022-11-01 [2] Bioconductor

 [1] /data3/wsx/R/x86_64-pc-linux-gnu-library/4.2
 [2] /opt/R/4.2.2/lib/R/library

Logs

> library(gcap)
Loading required package: ASCAT
Loading required package: RColorBrewer
Loading required package: splines
Loading required package: readr
Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, aperm, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated, eval, evalq,
    Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply, union, unique, unsplit,
    which.max, which.min

Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following objects are masked from ‘package:base’:

    expand.grid, I, unname

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: parallel
Loading required package: doParallel
Loading required package: foreach
Loading required package: iterators
Loading required package: sigminer
sigminer version 2.1.9
- Star me at https://github.com/ShixiangWang/sigminer
- Run hello() to see usage and citation.
gcap version 1.1.4
- Project URL at https://github.com/ShixiangWang/gcap

Citation:
    Wang, S., Wu, CY., He, MM. et al. Machine learning-based extrachromosomal DNA identification in 
    large-scale cohorts reveals its clinical implications in cancer. Nat Commun 15, 1515 (2024). https://doi.org/10.1038/s41467-024-45479-6
> # hg38 ----------------
> gcap.workflow(
+   tumourseqfile = "~/gcap_debug/116655-072-R-AK7A15E12-WES.bwa.final.bam", 
+   normalseqfile = "~/gcap_debug/116655_germline-WES.bwa.final.bam",
+   tumourname = "Test_T",
+   normalname = "Test_N",
+   jobname = "S116655",
+   outdir = "~/gcap_debug/gcap_result",
+   allelecounter_exe = "~/miniconda3/envs/cancerit/bin/alleleCounter", 
+   g1000allelesprefix = file.path(
+     "~/share/gcap_reference/1000G_loci_hg38/",
+     "1kg.phase3.v5a_GRCh38nounref_allele_index_chr"
+   ), 
+   g1000lociprefix = file.path("~/share/gcap_reference/1000G_loci_hg38/",
+                               "1kg.phase3.v5a_GRCh38nounref_loci_chrstring_chr"
+   ),
+   GCcontentfile = "~/share/gcap_reference/GC_correction_hg38.txt",
+   replictimingfile = "~/share/gcap_reference/RT_correction_hg38.txt",
+   skip_finished_ASCAT = TRUE,
+   skip_ascat_call = FALSE,
+   result_file_prefix = "S116655",
+   genome_build = "hg38",
+   model = "XGB11"
+ )
<gcap> 2024-05-14 09:47:17 info [gcap.workflow]: =====================
<gcap> 2024-05-14 09:47:17 info [gcap.workflow]:    GCAP WORKFLOW
<gcap> 2024-05-14 09:47:17 info [gcap.workflow]: =====================
<gcap> 2024-05-14 09:47:17 info [gcap.workflow]: 
<gcap> 2024-05-14 09:47:17 info [gcap.workflow]: =====================
<gcap> 2024-05-14 09:47:17 info [gcap.workflow]: Step 1: Run ASCAT 3.0
<gcap> 2024-05-14 09:47:17 info [gcap.workflow]: =====================
<gcap> 2024-05-14 09:47:17 info [gcap.runASCAT]: > Run ASCAT on WES data <
<gcap> 2024-05-14 09:47:17 info [gcap.runASCAT]: 
<gcap> 2024-05-14 09:47:17 info [gcap.runASCAT]: Configs:
<gcap> 2024-05-14 09:47:17 info [gcap.runASCAT]:   result path set to /data3/wsx/gcap_debug/gcap_result
<gcap> 2024-05-14 09:47:17 info [gcap.runASCAT]:   allelecounter_exe set to ~/miniconda3/envs/cancerit/bin/alleleCounter
<gcap> 2024-05-14 09:47:17 info [gcap.runASCAT]:   g1000allelesprefix set to ~/share/gcap_reference/1000G_loci_hg38//1kg.phase3.v5a_GRCh38nounref_allele_index_chr
<gcap> 2024-05-14 09:47:17 info [gcap.runASCAT]:   g1000lociprefix set to ~/share/gcap_reference/1000G_loci_hg38//1kg.phase3.v5a_GRCh38nounref_loci_chrstring_chr
<gcap> 2024-05-14 09:47:17 info [gcap.runASCAT]:   GCcontentfile set to ~/share/gcap_reference/GC_correction_hg38.txt
<gcap> 2024-05-14 09:47:17 info [gcap.runASCAT]:   replictimingfile set to ~/share/gcap_reference/RT_correction_hg38.txt
<gcap> 2024-05-14 09:47:17 info [gcap.runASCAT]:   nthreads set to 22
<gcap> 2024-05-14 09:47:17 info [gcap.runASCAT]:   minCounts set to 10
<gcap> 2024-05-14 09:47:17 info [gcap.runASCAT]:   BED_file set to NA
<gcap> 2024-05-14 09:47:17 info [gcap.runASCAT]:   probloci_file set to NA
<gcap> 2024-05-14 09:47:17 info [gcap.runASCAT]:   chrom_names set to <1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22>
<gcap> 2024-05-14 09:47:17 info [gcap.runASCAT]:   gender set to <XX>
<gcap> 2024-05-14 09:47:17 info [gcap.runASCAT]:   min_base_qual set to 20
<gcap> 2024-05-14 09:47:17 info [gcap.runASCAT]:   min_map_qual set to 35
<gcap> 2024-05-14 09:47:17 info [gcap.runASCAT]:   penalty set to 70
<gcap> 2024-05-14 09:47:17 info [gcap.runASCAT]:   skip_finished_ASCAT set to TRUE
<gcap> 2024-05-14 09:47:17 info [gcap.runASCAT]: 1 jobs detected
<gcap> 2024-05-14 09:47:17 info [gcap.runASCAT]: No ASCAT job to skip.
<gcap> 2024-05-14 09:47:17 info [FUN]: start submitting job S116655
<gcap> 2024-05-14 09:47:17 info [FUN]:      tumor data file: /data3/wsx/gcap_debug/116655-072-R-AK7A15E12-WES.bwa.final.bam
<gcap> 2024-05-14 09:47:17 info [FUN]:     normal data file: /data3/wsx/gcap_debug/116655_germline-WES.bwa.final.bam
<gcap> 2024-05-14 09:47:17 info [FUN]:    tumor sample name: Test_T
<gcap> 2024-05-14 09:47:17 info [FUN]:   normal sample name: Test_N
[W::hts_idx_load2] The index file is older than the data file: /data3/wsx/gcap_debug/116655-072-R-AK7A15E12-WES.bwa.final.bam.bai
[W::hts_idx_load2] The index file is older than the data file: /data3/wsx/gcap_debug/116655-072-R-AK7A15E12-WES.bwa.final.bam.bai
[W::hts_idx_load2] The index file is older than the data file: /data3/wsx/gcap_debug/116655-072-R-AK7A15E12-WES.bwa.final.bam.bai
[W::hts_idx_load2] The index file is older than the data file: /data3/wsx/gcap_debug/116655-072-R-AK7A15E12-WES.bwa.final.bam.bai
[W::hts_idx_load2] [W::hts_idx_load2] [W::hts_idx_load2] [W::hts_idx_load2] [W::hts_idx_load2] The index file is older than the data file: /data3/wsx/gcap_debug/116655-072-R-AK7A15E12-WES.bwa.final.bam.baiThe index file is older than the data file: /data3/wsx/gcap_debug/116655-072-R-AK7A15E12-WES.bwa.final.bam.baiThe index file is older than the data file: /data3/wsx/gcap_debug/116655-072-R-AK7A15E12-WES.bwa.final.bam.baiThe index file is older than the data file: /data3/wsx/gcap_debug/116655-072-R-AK7A15E12-WES.bwa.final.bam.baiThe index file is older than the data file: /data3/wsx/gcap_debug/116655-072-R-AK7A15E12-WES.bwa.final.bam.bai

[W::hts_idx_load2] [W::hts_idx_load2] [W::hts_idx_load2] The index file is older than the data file: /data3/wsx/gcap_debug/116655-072-R-AK7A15E12-WES.bwa.final.bam.baiThe index file is older than the data file: /data3/wsx/gcap_debug/116655-072-R-AK7A15E12-WES.bwa.final.bam.baiThe index file is older than the data file: /data3/wsx/gcap_debug/116655-072-R-AK7A15E12-WES.bwa.final.bam.bai

[W::hts_idx_load2] [W::hts_idx_load2] The index file is older than the data file: /data3/wsx/gcap_debug/116655-072-R-AK7A15E12-WES.bwa.final.bam.baiThe index file is older than the data file: /data3/wsx/gcap_debug/116655-072-R-AK7A15E12-WES.bwa.final.bam.bai
[W::hts_idx_load2] 
The index file is older than the data file: /data3/wsx/gcap_debug/116655-072-R-AK7A15E12-WES.bwa.final.bam.bai
[W::hts_idx_load2] [W::hts_idx_load2] [W::hts_idx_load2] The index file is older than the data file: /data3/wsx/gcap_debug/116655-072-R-AK7A15E12-WES.bwa.final.bam.baiThe index file is older than the data file: /data3/wsx/gcap_debug/116655-072-R-AK7A15E12-WES.bwa.final.bam.baiThe index file is older than the data file: /data3/wsx/gcap_debug/116655-072-R-AK7A15E12-WES.bwa.final.bam.bai

[W::hts_idx_load2] The index file is older than the data file: /data3/wsx/gcap_debug/116655-072-R-AK7A15E12-WES.bwa.final.bam.bai
[W::hts_idx_load2] The index file is older than the data file: /data3/wsx/gcap_debug/116655-072-R-AK7A15E12-WES.bwa.final.bam.bai
[W::hts_idx_load2] The index file is older than the data file: /data3/wsx/gcap_debug/116655-072-R-AK7A15E12-WES.bwa.final.bam.bai
[W::hts_idx_load2] The index file is older than the data file: /data3/wsx/gcap_debug/116655-072-R-AK7A15E12-WES.bwa.final.bam.bai
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
[W::hts_idx_load2] [W::hts_idx_load2] [W::hts_idx_load2] The index file is older than the data file: /data3/wsx/gcap_debug/116655_germline-WES.bwa.final.bam.baiThe index file is older than the data file: /data3/wsx/gcap_debug/116655_germline-WES.bwa.final.bam.baiThe index file is older than the data file: /data3/wsx/gcap_debug/116655_germline-WES.bwa.final.bam.bai

[W::hts_idx_load2] The index file is older than the data file: /data3/wsx/gcap_debug/116655_germline-WES.bwa.final.bam.bai
[W::hts_idx_load2] The index file is older than the data file: /data3/wsx/gcap_debug/116655_germline-WES.bwa.final.bam.bai
[W::hts_idx_load2] The index file is older than the data file: /data3/wsx/gcap_debug/116655_germline-WES.bwa.final.bam.bai
[W::hts_idx_load2] [W::hts_idx_load2] The index file is older than the data file: /data3/wsx/gcap_debug/116655_germline-WES.bwa.final.bam.bai[W::hts_idx_load2] The index file is older than the data file: /data3/wsx/gcap_debug/116655_germline-WES.bwa.final.bam.bai

The index file is older than the data file: /data3/wsx/gcap_debug/116655_germline-WES.bwa.final.bam.bai[W::hts_idx_load2] The index file is older than the data file: /data3/wsx/gcap_debug/116655_germline-WES.bwa.final.bam.bai

[W::hts_idx_load2] The index file is older than the data file: /data3/wsx/gcap_debug/116655_germline-WES.bwa.final.bam.bai
[W::hts_idx_load2] The index file is older than the data file: /data3/wsx/gcap_debug/116655_germline-WES.bwa.final.bam.bai
[W::hts_idx_load2] The index file is older than the data file: /data3/wsx/gcap_debug/116655_germline-WES.bwa.final.bam.bai
[W::hts_idx_load2] The index file is older than the data file: /data3/wsx/gcap_debug/116655_germline-WES.bwa.final.bam.bai
[W::hts_idx_load2] [W::hts_idx_load2] The index file is older than the data file: /data3/wsx/gcap_debug/116655_germline-WES.bwa.final.bam.baiThe index file is older than the data file: /data3/wsx/gcap_debug/116655_germline-WES.bwa.final.bam.bai

[W::hts_idx_load2] The index file is older than the data file: /data3/wsx/gcap_debug/116655_germline-WES.bwa.final.bam.bai
[W::hts_idx_load2] The index file is older than the data file: /data3/wsx/gcap_debug/116655_germline-WES.bwa.final.bam.bai
[W::hts_idx_load2] The index file is older than the data file: /data3/wsx/gcap_debug/116655_germline-WES.bwa.final.bam.bai
[W::hts_idx_load2] The index file is older than the data file: /data3/wsx/gcap_debug/116655_germline-WES.bwa.final.bam.bai
[W::hts_idx_load2] The index file is older than the data file: /data3/wsx/gcap_debug/116655_germline-WES.bwa.final.bam.bai
[W::hts_idx_load2] The index file is older than the data file: /data3/wsx/gcap_debug/116655_germline-WES.bwa.final.bam.bai
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
[1] Reading Tumor LogR data...
[1] Reading Tumor BAF data...
[1] Reading Germline LogR data...
[1] Reading Germline BAF data...
[1] Registering SNP locations...
[1] Splitting genome in distinct chunks...
[1] Sample Test_T (1/1)                                                                                                                                                       
GC correlation:  25bp 0.049 ; 50bp 0.057 ; 100bp 0.064 ; 200bp 0.073 ; 500bp 0.087 ; 1kb 0.099 ; 2kb 0.109 ; 5kb 0.119 ; 10kb 0.125 ; 20kb 0.129 ; 50kb 0.135 ; 100kb 0.140 ; 200kb 0.144 ; 500kb 0.148 ; 1Mb 0.148 ; 2Mb 0.139 ; 5Mb 0.102 ; 10Mb 0.056 ; 
Short window size:  1kb 
Long window size:  100kb 
Replication timing correlation:  Bg02es 0.11 ; Bj 0.12 ; Gm06990 0.13 ; Gm12801 0.14 ; Gm12812 0.13 ; Gm12813 0.13 ; Gm12878 0.13 ; Helas3 0.12 ; Hepg2 0.14 ; Huvec 0.12 ; Imr90 0.12 ; K562 0.14 ; Mcf7 0.13 ; Nhek 0.13 ; Sknsh 0.14 ; 
Replication dataset:  Hepg2 
[1] Plotting tumor data
[1] Plotting germline data
[1] Sample Test_T (1/1)
[1] Sample Test_T (1/1)
<gcap> 2024-05-14 10:22:57 info [doTryCatch]: job S116655 done
<gcap> 2024-05-14 10:22:57 info [gcap.runASCAT]: ASCAT analysis done, check /data3/wsx/gcap_debug/gcap_result for results
<gcap> 2024-05-14 10:22:57 info [gcap.workflow]: checking ASCAT result files
<gcap> 2024-05-14 10:22:57 info [gcap.workflow]: ============================================================
<gcap> 2024-05-14 10:22:57 info [gcap.workflow]: Step 2: Extract features and collapse features to gene level
<gcap> 2024-05-14 10:22:57 info [gcap.workflow]: ============================================================
<gcap> 2024-05-14 10:22:57 info [gcap.runBuildflow]: extracting sample-level and region-level features
<gcap> 2024-05-14 10:22:57 info [gcap.extractFeatures]: > Extract features from ASCAT results <
<gcap> 2024-05-14 10:22:57 info [gcap.extractFeatures]: 
<gcap> 2024-05-14 10:22:57 info [gcap.extractFeatures]: reading ASCAT file list
reading ~/gcap_debug/gcap_result/S116655.ASCAT.rds
<gcap> 2024-05-14 10:22:59 info [gcap.extractFeatures]: using unique IDs from file names for avoid the sample name repetition
<gcap> 2024-05-14 10:22:59 info [gcap.extractFeatures]: back up default sample column to old_sample
<gcap> 2024-05-14 10:22:59 info [gcap.extractFeatures]: combining purity and ploidy info as data.frame
<gcap> 2024-05-14 10:22:59 info [gcap.extractFeatures]: generating CopyNumber object in sigminer package
ℹ [2024-05-14 10:22:59]: Started.
ℹ [2024-05-14 10:22:59]: Genome build  : hg38.
ℹ [2024-05-14 10:22:59]: Genome measure: called.
ℹ [2024-05-14 10:22:59]: When add_loh is TRUE, use_all is forced to TRUE.
Please drop columns you don't want to keep before reading.
✔ [2024-05-14 10:22:59]: Chromosome size database for build obtained.
ℹ [2024-05-14 10:23:00]: Reading input.
✔ [2024-05-14 10:23:00]: A data frame as input detected.
✔ [2024-05-14 10:23:00]: Column names checked.
✔ [2024-05-14 10:23:00]: Column order set.
✔ [2024-05-14 10:23:00]: Chromosomes unified.
✔ [2024-05-14 10:23:00]: Data imported.
ℹ [2024-05-14 10:23:00]: Segments info:
ℹ [2024-05-14 10:23:00]:     Keep - 569
ℹ [2024-05-14 10:23:00]:     Drop - 0
✔ [2024-05-14 10:23:00]: Segments sorted.
ℹ [2024-05-14 10:23:00]: Adding LOH labels...
ℹ [2024-05-14 10:23:00]: Skipped joining adjacent segments with same copy number value.
✔ [2024-05-14 10:23:00]: Segmental table cleaned.
ℹ [2024-05-14 10:23:00]: Annotating.
✔ [2024-05-14 10:23:00]: Annotation done.
ℹ [2024-05-14 10:23:00]: Summarizing per sample.
✔ [2024-05-14 10:23:00]: Summarized.
ℹ [2024-05-14 10:23:00]: Generating CopyNumber object.
✔ [2024-05-14 10:23:00]: Generated.
ℹ [2024-05-14 10:23:00]: Validating object.
✔ [2024-05-14 10:23:00]: Done.
ℹ [2024-05-14 10:23:00]: 0.603 secs elapsed.
<gcap> 2024-05-14 10:23:00 info [gcap.extractFeatures]: estimating ploidy from copy number data
<gcap> 2024-05-14 10:23:00 info [gcap.extractFeatures]: checking if input data contains ploidy and if there are NAs should be overwritten
<gcap> 2024-05-14 10:23:00 info [gcap.extractFeatures]: getting Aneuploidy score
<gcap> 2024-05-14 10:23:00 info [gcap.extractFeatures]: getting pLOH score
<gcap> 2024-05-14 10:23:00 info [gcap.extractFeatures]: getting CNA burden
<gcap> 2024-05-14 10:23:00 info [gcap.extractFeatures]: generating copy number catalog matrix for fitting signature activity
<gcap> 2024-05-14 10:23:00 info [gcap.extractFeatures]: fitting copy number signature activity
<gcap> 2024-05-14 10:23:00 info [gcap.extractFeatures]: merging data
<gcap> 2024-05-14 10:23:00 info [gcap.extractFeatures]: feature extraction done
<gcap> 2024-05-14 10:23:00 info [gcap.extractFeatures]: now you can modify the result and append 'age' and 'gender' columns to the 'fts_sample' element of result list
<gcap> 2024-05-14 10:23:00 info [gcap.runBuildflow]: collapsing all data into gene-level prediction input
<gcap> 2024-05-14 10:23:00 info [gcap.collapse2Genes]: please make sure the first 3 columns of `fts$fts_region` are for chr, start, end.
<gcap> 2024-05-14 10:23:00 info [gcap.collapse2Genes]: collapsing region-level features to gene-level
<gcap> 2024-05-14 10:23:00 info [collapse_to_genes]: checking input chromosome names
<gcap> 2024-05-14 10:23:00 info [collapse_to_genes]: reading reference file /data3/wsx/R/x86_64-pc-linux-gnu-library/4.2/gcap/extdata/hg38_target_genes.rds
<gcap> 2024-05-14 10:23:01 info [collapse_to_genes]: finding overlaps
<gcap> 2024-05-14 10:23:01 info [collapse_to_genes]: calculating intersect size
<gcap> 2024-05-14 10:23:01 info [collapse_to_genes]: keeping records with >= 100% overlap ratio with a gene
<gcap> 2024-05-14 10:23:01 info [gcap.collapse2Genes]: merging gene-level and sample-level data
<gcap> 2024-05-14 10:23:01 info [gcap.collapse2Genes]: merging data and prior amplicon frequency data
<gcap> 2024-05-14 10:23:01 info [gcap.collapse2Genes]: done
<gcap> 2024-05-14 10:23:01 info [gcap.workflow]: =======================
<gcap> 2024-05-14 10:23:01 info [gcap.workflow]: Step 3: Run prediction
<gcap> 2024-05-14 10:23:01 info [gcap.workflow]: =======================
<gcap> 2024-05-14 10:23:01 info [gcap.runPrediction]: using model file XGB_NF11.rds
<gcap> 2024-05-14 10:23:01 info [gcap.runPrediction]: selecting necessary features from input data
<gcap> 2024-05-14 10:23:01 info [gcap.runPrediction]: running prediction
[10:23:01] WARNING: amalgamation/../src/c_api/c_api.cc:718: `ntree_limit` is deprecated, use `iteration_range` instead.
<gcap> 2024-05-14 10:23:01 info [gcap.workflow]: ====================================
<gcap> 2024-05-14 10:23:01 info [gcap.workflow]: Step 4: Run scoring and summarizing
<gcap> 2024-05-14 10:23:01 info [gcap.workflow]: ====================================
<gcap> 2024-05-14 10:23:01 info [gcap.runScoring]: checking input data type
<gcap> 2024-05-14 10:23:01 info [gcap.runScoring]: checking columns
<gcap> 2024-05-14 10:23:01 info [gcap.runScoring]: filtering out records without prob result
<gcap> 2024-05-14 10:23:01 info [gcap.runScoring]: joining extra annotation data
<gcap> 2024-05-14 10:23:04 info [gcap.runScoring]: only keep genes labeled as amplicons in result fCNA object
<gcap> 2024-05-14 10:23:04 info [gcap.runScoring]: No fCNA records detected
summarizing sample...
  classifying samples with min_prob=0.6
done
======================
A <fCNA> object
  record: 0
    case: 1
     |__ (0) 0 noncircular
     |__ (0) 0 circular
======================
<gcap> 2024-05-14 10:23:04 info [gcap.runScoring]: done
<gcap> 2024-05-14 10:23:07 info [gcap.workflow]: Saving raw prediction result to ~/gcap_debug/gcap_result/S116655_prediction_result.rds
<gcap> 2024-05-14 10:23:07 info [gcap.workflow]: Saving fCNA records and sample info to ~/gcap_debug/gcap_result/S116655_fCNA_records.csv, ~/gcap_debug/gcap_result/S116655_sample_info.csv
<gcap> 2024-05-14 10:23:07 info [gcap.workflow]: =======================================
<gcap> 2024-05-14 10:23:07 info [gcap.workflow]:  Done! Thanks for using GCAP workflow
<gcap> 2024-05-14 10:23:07 info [gcap.workflow]: =======================================
There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
Warning messages:
1: In !is.null(homsegs) && !is.na(homsegs) :
  'length(x) = 30 > 1' in coercion to 'logical(1)'
2: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 3974 > 1' in coercion to 'logical(1)'
3: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 8925 > 1' in coercion to 'logical(1)'
4: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 8005 > 1' in coercion to 'logical(1)'
5: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 7126 > 1' in coercion to 'logical(1)'
6: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 6874 > 1' in coercion to 'logical(1)'
7: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 5988 > 1' in coercion to 'logical(1)'
8: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 6012 > 1' in coercion to 'logical(1)'
9: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 6662 > 1' in coercion to 'logical(1)'
10: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 6022 > 1' in coercion to 'logical(1)'
11: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 6309 > 1' in coercion to 'logical(1)'
12: In !is.null(homsegs) && !is.na(homsegs) :
  'length(x) = 21 > 1' in coercion to 'logical(1)'
13: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 4753 > 1' in coercion to 'logical(1)'
14: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 6714 > 1' in coercion to 'logical(1)'
15: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 5472 > 1' in coercion to 'logical(1)'
16: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 5830 > 1' in coercion to 'logical(1)'
17: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 9601 > 1' in coercion to 'logical(1)'
18: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 4096 > 1' in coercion to 'logical(1)'
19: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 5412 > 1' in coercion to 'logical(1)'
20: In !is.null(homsegs) && !is.na(homsegs) :
  'length(x) = 27 > 1' in coercion to 'logical(1)'
21: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 7386 > 1' in coercion to 'logical(1)'
22: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 9412 > 1' in coercion to 'logical(1)'
23: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 4836 > 1' in coercion to 'logical(1)'
24: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 3798 > 1' in coercion to 'logical(1)'
25: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 7405 > 1' in coercion to 'logical(1)'
26: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 8696 > 1' in coercion to 'logical(1)'
27: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 5948 > 1' in coercion to 'logical(1)'
28: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 4103 > 1' in coercion to 'logical(1)'
29: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 4321 > 1' in coercion to 'logical(1)'
30: In !is.null(homsegs) && !is.na(homsegs) :
  'length(x) = 36 > 1' in coercion to 'logical(1)'
31: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 10087 > 1' in coercion to 'logical(1)'
32: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 4624 > 1' in coercion to 'logical(1)'
33: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 3991 > 1' in coercion to 'logical(1)'
34: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 4482 > 1' in coercion to 'logical(1)'
35: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 8284 > 1' in coercion to 'logical(1)'
36: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 6745 > 1' in coercion to 'logical(1)'
37: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 11550 > 1' in coercion to 'logical(1)'
38: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 3906 > 1' in coercion to 'logical(1)'
39: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 4429 > 1' in coercion to 'logical(1)'
40: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 5561 > 1' in coercion to 'logical(1)'
41: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 5334 > 1' in coercion to 'logical(1)'
42: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 4022 > 1' in coercion to 'logical(1)'
43: In !is.null(homsegs) && !is.na(homsegs) :
  'length(x) = 15 > 1' in coercion to 'logical(1)'
44: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 7122 > 1' in coercion to 'logical(1)'
45: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 4058 > 1' in coercion to 'logical(1)'
46: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 5002 > 1' in coercion to 'logical(1)'
47: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 4824 > 1' in coercion to 'logical(1)'
48: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 9124 > 1' in coercion to 'logical(1)'
49: In !is.null(homsegs) && !is.na(homsegs) :
  'length(x) = 21 > 1' in coercion to 'logical(1)'
50: In !is.na(dif) && sum(dif > 0.3) > 5 :
  'length(x) = 6755 > 1' in coercion to 'logical(1)'
tingchiafelix commented 4 months ago

@ShixiangWang thanks for running my files. I followed your instructions and ran through hg38 genome. However, I got an error again (see below log).

One thing I noticed is "gcap" package.

Your is ShixiangWang/gcap@1f1dbd2 Mine is ShixiangWang/gcap@cdfb1c7

Would this cause this error? Also, I'm including my session info and log messages below. Any suggestion would be much appreciated.

Session info


> devtools::session_info()
─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.3 (2024-02-29)
 os       Oracle Linux Server 8.9
 system   x86_64, linux-gnu
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/New_York
 date     2024-05-14
 pandoc   2.0.6 @ /usr/bin/pandoc

─ Packages ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 package          * version   date (UTC) lib source
 ASCAT            * 3.0.0     2024-05-06 [2] Github (ShixiangWang/ascat@51fd695)
 Biobase          * 2.62.0    2023-10-24 [2] Bioconductor
 BiocGenerics     * 0.48.1    2023-11-01 [2] Bioconductor
 bitops             1.0-7     2021-04-24 [2] CRAN (R 4.3.3)
 cachem             1.0.8     2023-05-01 [2] CRAN (R 4.3.3)
 cli                3.6.2     2023-12-11 [2] CRAN (R 4.3.3)
 cluster            2.1.6     2023-12-01 [3] CRAN (R 4.3.3)
 codetools          0.2-19    2023-02-01 [3] CRAN (R 4.3.3)
 colorspace         2.1-0     2023-01-23 [2] CRAN (R 4.3.3)
 crayon             1.5.2     2022-09-29 [2] CRAN (R 4.3.3)
 data.table         1.15.4    2024-03-30 [2] CRAN (R 4.3.3)
 devtools           2.4.5     2022-10-11 [2] CRAN (R 4.3.3)
 digest             0.6.35    2024-03-11 [2] CRAN (R 4.3.3)
 doParallel       * 1.0.17    2022-02-07 [2] CRAN (R 4.3.3)
 dplyr              1.1.4     2023-11-17 [2] CRAN (R 4.3.3)
 ellipsis           0.3.2     2021-04-29 [2] CRAN (R 4.3.3)
 fansi              1.0.6     2023-12-08 [2] CRAN (R 4.3.3)
 fastmap            1.1.1     2023-02-24 [2] CRAN (R 4.3.3)
 foreach          * 1.5.2     2022-02-02 [2] CRAN (R 4.3.3)
 fs                 1.6.4     2024-04-25 [2] CRAN (R 4.3.3)
 furrr              0.3.1     2022-08-15 [2] CRAN (R 4.3.3)
 future             1.33.2    2024-03-26 [2] CRAN (R 4.3.3)
 gcap             * 1.1.4     2024-05-01 [2] Github (ShixiangWang/gcap@cdfb1c7)
 generics           0.1.3     2022-07-05 [2] CRAN (R 4.3.3)
 GenomeInfoDb     * 1.38.8    2024-03-15 [2] Bioconductor 3.18 (R 4.3.3)
 GenomeInfoDbData   1.2.11    2024-04-30 [2] Bioconductor
 GenomicRanges    * 1.54.1    2023-10-29 [2] Bioconductor
 GetoptLong         1.0.5     2020-12-15 [2] CRAN (R 4.3.3)
 ggplot2            3.5.1     2024-04-23 [2] CRAN (R 4.3.3)
 GlobalOptions      0.1.2     2020-06-10 [2] CRAN (R 4.3.3)
 globals            0.16.3    2024-03-08 [2] CRAN (R 4.3.3)
 glue               1.7.0     2024-01-09 [2] CRAN (R 4.3.3)
 gridBase           0.4-7     2014-02-24 [2] CRAN (R 4.3.3)
 gtable             0.3.5     2024-04-22 [2] CRAN (R 4.3.3)
 hms                1.1.3     2023-03-21 [2] CRAN (R 4.3.3)
 htmltools          0.5.8.1   2024-04-04 [2] CRAN (R 4.3.3)
 htmlwidgets        1.6.4     2023-12-06 [2] CRAN (R 4.3.3)
 httpuv             1.6.15    2024-03-26 [2] CRAN (R 4.3.3)
 IRanges          * 2.36.0    2023-10-24 [2] Bioconductor
 iterators        * 1.0.14    2022-02-05 [2] CRAN (R 4.3.3)
 jsonlite           1.8.8     2023-12-04 [2] CRAN (R 4.3.3)
 later              1.3.2     2023-12-06 [2] CRAN (R 4.3.3)
 lattice            0.22-5    2023-10-24 [3] CRAN (R 4.3.3)
 lgr                0.4.4     2022-09-05 [2] CRAN (R 4.3.3)
 lifecycle          1.0.4     2023-11-07 [2] CRAN (R 4.3.3)
 listenv            0.9.1     2024-01-29 [2] CRAN (R 4.3.3)
 magrittr           2.0.3     2022-03-30 [2] CRAN (R 4.3.3)
 Matrix             1.6-5     2024-01-11 [3] CRAN (R 4.3.3)
 memoise            2.0.1     2021-11-26 [2] CRAN (R 4.3.3)
 mime               0.12      2021-09-28 [2] CRAN (R 4.3.3)
 miniUI             0.1.1.1   2018-05-18 [2] CRAN (R 4.3.3)
 munsell            0.5.1     2024-04-01 [2] CRAN (R 4.3.3)
 NMF                0.27      2024-02-08 [2] CRAN (R 4.3.3)
 parallelly         1.37.1    2024-02-29 [2] CRAN (R 4.3.3)
 pillar             1.9.0     2023-03-22 [2] CRAN (R 4.3.3)
 pkgbuild           1.4.4     2024-03-17 [2] CRAN (R 4.3.3)
 pkgconfig          2.0.3     2019-09-22 [2] CRAN (R 4.3.3)
 pkgload            1.3.4     2024-01-16 [2] CRAN (R 4.3.3)
 plyr               1.8.9     2023-10-02 [2] CRAN (R 4.3.3)
 profvis            0.3.8     2023-05-02 [2] CRAN (R 4.3.3)
 promises           1.3.0     2024-04-05 [2] CRAN (R 4.3.3)
 purrr              1.0.2     2023-08-10 [2] CRAN (R 4.3.3)
 quadprog           1.5-8     2019-11-20 [2] CRAN (R 4.3.3)
 R6                 2.5.1     2021-08-19 [2] CRAN (R 4.3.3)
 rappdirs           0.3.3     2021-01-31 [2] CRAN (R 4.3.3)
 RColorBrewer     * 1.1-3     2022-04-03 [2] CRAN (R 4.3.3)
 Rcpp               1.0.12    2024-01-09 [2] CRAN (R 4.3.3)
 RCurl              1.98-1.14 2024-01-09 [2] CRAN (R 4.3.3)
 readr            * 2.1.5     2024-01-10 [2] CRAN (R 4.3.3)
 registry           0.5-1     2019-03-05 [2] CRAN (R 4.3.3)
 remotes            2.5.0     2024-03-17 [2] CRAN (R 4.3.3)
 reshape2           1.4.4     2020-04-09 [2] CRAN (R 4.3.3)
 rjson              0.2.21    2022-01-09 [2] CRAN (R 4.3.3)
 rlang              1.1.3     2024-01-10 [2] CRAN (R 4.3.3)
 rngtools           1.5.2     2021-09-20 [2] CRAN (R 4.3.3)
 S4Vectors        * 0.40.2    2023-11-23 [2] Bioconductor 3.18 (R 4.3.3)
 scales             1.3.0     2023-11-28 [2] CRAN (R 4.3.3)
 sessioninfo        1.2.2     2021-12-06 [2] CRAN (R 4.3.3)
 shiny              1.8.1.1   2024-04-02 [2] CRAN (R 4.3.3)
 sigminer         * 2.3.0     2023-12-12 [2] CRAN (R 4.3.3)
 stringi            1.8.3     2023-12-11 [2] CRAN (R 4.3.3)
 stringr            1.5.1     2023-11-14 [2] CRAN (R 4.3.3)
 tibble             3.2.1     2023-03-20 [2] CRAN (R 4.3.3)
 tidyselect         1.2.1     2024-03-11 [2] CRAN (R 4.3.3)
 tzdb               0.4.0     2023-05-12 [2] CRAN (R 4.3.3)
 urlchecker         1.0.1     2021-11-30 [2] CRAN (R 4.3.3)
 usethis            2.2.3     2024-02-19 [2] CRAN (R 4.3.3)
 utf8               1.2.4     2023-10-22 [2] CRAN (R 4.3.3)
 uuid               1.2-0     2024-01-14 [2] CRAN (R 4.3.3)
 vctrs              0.6.5     2023-12-01 [2] CRAN (R 4.3.3)
 xgboost            1.5.2.1   2022-02-21 [2] CRAN (R 4.3.3)
 xtable             1.8-4     2019-04-21 [2] CRAN (R 4.3.3)
 XVector            0.42.0    2023-10-24 [2] Bioconductor
 zlibbioc           1.48.2    2024-03-13 [2] Bioconductor 3.18 (R 4.3.3)

 [1] /mnt/nasapps/production/R/4.3.2
 [2] /home/changtn/R/x86_64-redhat-linux-gnu-library/4.3
 [3] /usr/lib64/R/library
 [4] /usr/share/R/library

Log

Loading required package: ASCAT
Loading required package: RColorBrewer
Loading required package: splines
Loading required package: readr
Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, aperm, append, as.data.frame, basename, cbind,
    colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
    get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
    Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
    table, tapply, union, unique, unsplit, which.max, which.min

Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:utils’:

    findMatches

The following objects are masked from ‘package:base’:

    expand.grid, I, unname

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: parallel
Loading required package: doParallel
Loading required package: foreach
Loading required package: iterators
Loading required package: sigminer
sigminer version 2.3.0
- Star me at https://github.com/ShixiangWang/sigminer
- Run hello() to see usage and citation.
gcap version 1.1.4
- Project URL at https://github.com/ShixiangWang/gcap

Citation:
    Wang, S., Wu, CY., He, MM. et al. Machine learning-based extrachromosomal DNA identification in
    large-scale cohorts reveals its clinical implications in cancer. Nat Commun 15, 1515 (2024). https://doi.org/10.1038/s41467-024-45479-6
<gcap> 2024-05-14 14:23:33.91281 info [gcap.workflow]: =====================
<gcap> 2024-05-14 14:23:33.994768 info [gcap.workflow]:    GCAP WORKFLOW
<gcap> 2024-05-14 14:23:34.010181 info [gcap.workflow]: =====================
<gcap> 2024-05-14 14:23:34.013962 info [gcap.workflow]:
<gcap> 2024-05-14 14:23:34.017747 info [gcap.workflow]: =====================
<gcap> 2024-05-14 14:23:34.021299 info [gcap.workflow]: Step 1: Run ASCAT 3.0
<gcap> 2024-05-14 14:23:34.025109 info [gcap.workflow]: =====================
<gcap> 2024-05-14 14:23:34.053504 info [gcap.runASCAT]: > Run ASCAT on WES data <
<gcap> 2024-05-14 14:23:34.05755 info [gcap.runASCAT]:
<gcap> 2024-05-14 14:23:34.061144 info [gcap.runASCAT]: Configs:
<gcap> 2024-05-14 14:23:34.064778 info [gcap.runASCAT]:   result path set to /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/output
<gcap> 2024-05-14 14:23:34.068447 info [gcap.runASCAT]:   allelecounter_exe set to ~/miniconda3/envs/cancerit/bin/alleleCounter
<gcap> 2024-05-14 14:23:34.072628 info [gcap.runASCAT]:   g1000allelesprefix set to /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/ref/1000G_loci_hg38//1kg.phase3.v5a_GRCh38nounref_allele_index_chr
<gcap> 2024-05-14 14:23:34.076255 info [gcap.runASCAT]:   g1000lociprefix set to /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/ref/1000G_loci_hg38//1kg.phase3.v5a_GRCh38nounref_loci_chrstring_chr
<gcap> 2024-05-14 14:23:34.079994 info [gcap.runASCAT]:   GCcontentfile set to /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/ref/GC_correction_hg38.txt
<gcap> 2024-05-14 14:23:34.083808 info [gcap.runASCAT]:   replictimingfile set to /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/ref/RT_correction_hg38.txt
<gcap> 2024-05-14 14:23:34.087556 info [gcap.runASCAT]:   nthreads set to 22
<gcap> 2024-05-14 14:23:34.09125 info [gcap.runASCAT]:   minCounts set to 10
<gcap> 2024-05-14 14:23:34.095211 info [gcap.runASCAT]:   BED_file set to NA
<gcap> 2024-05-14 14:23:34.099086 info [gcap.runASCAT]:   probloci_file set to NA
<gcap> 2024-05-14 14:23:34.103024 info [gcap.runASCAT]:   chrom_names set to <1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22>
<gcap> 2024-05-14 14:23:34.106693 info [gcap.runASCAT]:   gender set to <XX>
<gcap> 2024-05-14 14:23:34.110381 info [gcap.runASCAT]:   min_base_qual set to 20
<gcap> 2024-05-14 14:23:34.114615 info [gcap.runASCAT]:   min_map_qual set to 35
<gcap> 2024-05-14 14:23:34.11824 info [gcap.runASCAT]:   penalty set to 70
<gcap> 2024-05-14 14:23:34.122078 info [gcap.runASCAT]:   skip_finished_ASCAT set to TRUE
<gcap> 2024-05-14 14:23:34.130545 info [gcap.runASCAT]: 1 jobs detected
<gcap> 2024-05-14 14:23:34.134146 info [gcap.runASCAT]: No ASCAT job to skip.
<gcap> 2024-05-14 14:23:34.137871 info [FUN]: start submitting job S116655
<gcap> 2024-05-14 14:23:34.142185 info [FUN]:      tumor data file: /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/116655-072-R-AK7A15E12-WES.bwa.final.bam
<gcap> 2024-05-14 14:23:34.146052 info [FUN]:     normal data file: /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/116655_germline-WES.bwa.final.bam
<gcap> 2024-05-14 14:23:34.149747 info [FUN]:    tumor sample name: Test_T
<gcap> 2024-05-14 14:23:34.153461 info [FUN]:   normal sample name: Test_N
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Reading locis
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
Done reading locis
Multi pos start:
[1] Reading Tumor LogR data...
[1] Reading Tumor BAF data...
[1] Reading Germline LogR data...
[1] Reading Germline BAF data...
[1] Registering SNP locations...
[1] Splitting genome in distinct chunks...
[1] Sample Test_T (1/1)
GC correlation:  25bp 0.049 ; 50bp 0.057 ; 100bp 0.064 ; 200bp 0.073 ; 500bp 0.087 ; 1kb 0.099 ; 2kb 0.109 ; 5kb 0.119 ; 10kb 0.125 ; 20kb 0.129 ; 50kb 0.135 ; 100kb 0.140 ; 200kb 0.144 ; 500kb 0.148 ; 1Mb 0.148 ; 2Mb 0.139 ; 5Mb 0.102 ; 10Mb 0.056 ;
Short window size:  1kb
Long window size:  100kb
Replication timing correlation:  Bg02es 0.11 ; Bj 0.12 ; Gm06990 0.13 ; Gm12801 0.14 ; Gm12812 0.13 ; Gm12813 0.13 ; Gm12878 0.13 ; Helas3 0.12 ; Hepg2 0.14 ; Huvec 0.12 ; Imr90 0.12 ; K562 0.14 ; Mcf7 0.13 ; Nhek 0.13 ; Sknsh 0.14 ;
Replication dataset:  Hepg2
[1] Plotting tumor data
[1] Plotting germline data
[1] Sample Test_T (1/1)
<gcap> 2024-05-14 14:59:49.008591 fatal [value[[3L]]]: job S116655 failed in ASCAT due to following error
<gcap> 2024-05-14 14:59:49.022772 info [value[[3L]]]: 'length = 30' in coercion to 'logical(1)'
<gcap> 2024-05-14 14:59:49.027022 info [value[[3L]]]: =====
<gcap> 2024-05-14 14:59:49.031 info [value[[3L]]]: Please check your input bam files (if missing bam index? if its alignment quality is lower?)
<gcap> 2024-05-14 14:59:49.034931 info [value[[3L]]]: =====
<gcap> 2024-05-14 14:59:49.039372 info [gcap.runASCAT]: ASCAT analysis done, check /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/output for results
<gcap> 2024-05-14 14:59:49.045842 info [gcap.workflow]: checking ASCAT result files
<gcap> 2024-05-14 14:59:49.049786 warn [FUN]: result file /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/output/S116655.ASCAT.rds does not exist, the corresponding ASCAT calling has error occurred
<gcap> 2024-05-14 14:59:49.054895 warn [FUN]: /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/output/S116655.ASCAT.rds contains a failed ASCAT job, will discard it before next step
<gcap> 2024-05-14 14:59:49.058794 fatal [gcap.workflow]: no sucessful ASCAT result file to proceed!
<gcap> 2024-05-14 14:59:49.062439 fatal [gcap.workflow]: check your ASCAT setting before make sure this case could not be used!
Error in gcap.workflow(tumourseqfile = "/mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/116655-072-R-AK7A15E12-WES.bwa.final.bam",  :

Execution halted
ShixiangWang commented 4 months ago

@tingchiafelix You may need to run GCAP on R<4.3. The fixed ASCAT version is incompatible with the newer R release.

image

https://cran.r-project.org/doc/manuals/r-release/NEWS.html

I will take some time to update the ASCAT code to make them compatible.

ShixiangWang commented 4 months ago

@tingchiafelix I also updated the gcap to be compatible with the latest version of ASCAT (it could be run on R4.3). If you are interested, please follow the instructions at https://github.com/ShixiangWang/gcap?tab=readme-ov-file#install-ascat-required.

I have tested the new version with your provided data.

gcap.workflow(
  tumourseqfile = "~/gcap_debug/116655-072-R-AK7A15E12-WES.bwa.final.bam",
  normalseqfile = "~/gcap_debug/116655_germline-WES.bwa.final.bam",
  tumourname = "Test_T",
  normalname = "Test_N",
  jobname = "S116655",
  outdir = "~/gcap_debug/gcap_result_ascat",
  allelecounter_exe = "~/miniconda3/envs/cancerit/bin/alleleCounter",
  g1000allelesprefix = file.path(
    "~/share/gcap_reference/1000G_loci_hg38/",
    "1kg.phase3.v5a_GRCh38nounref_allele_index_chr"
  ),
  g1000lociprefix = file.path("~/share/gcap_reference/1000G_loci_hg38/",
                              "1kg.phase3.v5a_GRCh38nounref_loci_chrstring_chr"
  ),
  GCcontentfile = "~/share/gcap_reference/GC_G1000_hg38.txt",
  replictimingfile = "~/share/gcap_reference/RT_G1000_hg38.txt",
  skip_finished_ASCAT = TRUE,
  skip_ascat_call = FALSE,
  result_file_prefix = "S116655",
  genome_build = "hg38",
  model = "XGB11"
)

Correction files must be updated.

I am closing this issue now. Please file another issue if you got further questions.

tingchiafelix commented 4 months ago

@ShixiangWang appreciated your help! The process was done successfully following your workflow and suggestion (hg38 workflow). However, my BAM file was generated by hg19 reference genome. Thus, I did try the same workflow but replaced it with the hg19 reference files that you have provided, but it looks like the process failed with an error (length(ovl) > nrow(ASCATobj$Tumor_LogR)/10 is not TRUE). I have installed R 4.2.2 version.

Could you please have a look?

script

library(gcap)

# hg19 ----------------
gcap.workflow(
  tumourseqfile = "/mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/116655-072-R-AK7A15E12-WES.bwa.final.bam",
  normalseqfile = "/mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/116655_germline-WES.bwa.final.bam",
  tumourname = "Test_T",
  normalname = "Test_N",
  jobname = "S116655",
  outdir = "/mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/output",
  allelecounter_exe = "~/miniconda3/envs/ecDNA/bin/alleleCounter",
  g1000allelesprefix = file.path(
    "/mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/ref/1000G_loci_hg19/",
    "1000genomesAlleles2012_chr"
  ),
  g1000lociprefix = file.path("/mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/ref/1000G_loci_hg19/",
                  "1000genomesloci2012chrstring_chr"),
  GCcontentfile = "/mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/ref/GC_correction_updated_hg19.txt",
  replictimingfile = "/mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/ref/RT_correction_updated_hg19.txt",
  skip_finished_ASCAT = TRUE,
  skip_ascat_call = FALSE,
  result_file_prefix = "S116655",
  genome_build = "hg19",
  model = "XGB11"
)

Log

Loading required package: ASCAT
Loading required package: RColorBrewer
Loading required package: splines
Loading required package: readr
Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, aperm, append, as.data.frame, basename, cbind,
    colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
    get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
    Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
    table, tapply, union, unique, unsplit, which.max, which.min

Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following objects are masked from ‘package:base’:

    expand.grid, I, unname

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: parallel
Loading required package: doParallel
Loading required package: foreach
Loading required package: iterators
Loading required package: sigminer
Registered S3 method overwritten by 'sigminer':
  method      from
  print.bytes Rcpp
sigminer version 2.3.1
- Star me at https://github.com/ShixiangWang/sigminer
- Run hello() to see usage and citation.
gcap version 1.2.0
- Project URL at https://github.com/ShixiangWang/gcap

Citation:
    Wang, S., Wu, CY., He, MM. et al. Machine learning-based extrachromosomal DNA identification in
    large-scale cohorts reveals its clinical implications in cancer. Nat Commun 15, 1515 (2024). https://doi.org/10.1038/s41467-024-45479-6
<gcap> 2024-05-16 12:37:34 info [gcap.workflow]: =====================
<gcap> 2024-05-16 12:37:34 info [gcap.workflow]:    GCAP WORKFLOW
<gcap> 2024-05-16 12:37:34 info [gcap.workflow]: =====================
<gcap> 2024-05-16 12:37:34 info [gcap.workflow]:
<gcap> 2024-05-16 12:37:34 info [gcap.workflow]: =====================
<gcap> 2024-05-16 12:37:34 info [gcap.workflow]: Step 1: Run ASCAT 3.0
<gcap> 2024-05-16 12:37:34 info [gcap.workflow]: =====================
<gcap> 2024-05-16 12:37:34 info [gcap.runASCAT]: > Run ASCAT on WES data <
<gcap> 2024-05-16 12:37:34 info [gcap.runASCAT]:
<gcap> 2024-05-16 12:37:34 info [gcap.runASCAT]: Configs:
<gcap> 2024-05-16 12:37:34 info [gcap.runASCAT]:   result path set to /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/output
<gcap> 2024-05-16 12:37:34 info [gcap.runASCAT]:   allelecounter_exe set to ~/miniconda3/envs/ecDNA/bin/alleleCounter
<gcap> 2024-05-16 12:37:34 info [gcap.runASCAT]:   g1000allelesprefix set to /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/ref/1000G_loci_hg19//1000genomesAlleles2012_chr
<gcap> 2024-05-16 12:37:34 info [gcap.runASCAT]:   g1000lociprefix set to /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/ref/1000G_loci_hg19//1000genomesloci2012chrstring_chr
<gcap> 2024-05-16 12:37:34 info [gcap.runASCAT]:   GCcontentfile set to /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/ref/GC_correction_updated_hg19.txt
<gcap> 2024-05-16 12:37:34 info [gcap.runASCAT]:   replictimingfile set to /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/ref/RT_correction_updated_hg19.txt
<gcap> 2024-05-16 12:37:34 info [gcap.runASCAT]:   nthreads set to 22
<gcap> 2024-05-16 12:37:34 info [gcap.runASCAT]:   minCounts set to 10
<gcap> 2024-05-16 12:37:34 info [gcap.runASCAT]:   BED_file set to NA
<gcap> 2024-05-16 12:37:34 info [gcap.runASCAT]:   probloci_file set to NA
<gcap> 2024-05-16 12:37:34 info [gcap.runASCAT]:   chrom_names set to <1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22>
<gcap> 2024-05-16 12:37:34 info [gcap.runASCAT]:   gender set to <XX>
<gcap> 2024-05-16 12:37:34 info [gcap.runASCAT]:   min_base_qual set to 20
<gcap> 2024-05-16 12:37:34 info [gcap.runASCAT]:   min_map_qual set to 35
<gcap> 2024-05-16 12:37:34 info [gcap.runASCAT]:   penalty set to 70
<gcap> 2024-05-16 12:37:34 info [gcap.runASCAT]:   skip_finished_ASCAT set to TRUE
<gcap> 2024-05-16 12:37:34 info [gcap.runASCAT]: 1 jobs detected
<gcap> 2024-05-16 12:37:34 info [gcap.runASCAT]: No ASCAT job to skip.
<gcap> 2024-05-16 12:37:34 info [FUN]: start submitting job S116655
<gcap> 2024-05-16 12:37:34 info [FUN]:      tumor data file: /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/116655-072-R-AK7A15E12-WES.bwa.final.bam
<gcap> 2024-05-16 12:37:34 info [FUN]:     normal data file: /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/116655_germline-WES.bwa.final.bam
<gcap> 2024-05-16 12:37:34 info [FUN]:    tumor sample name: Test_T
<gcap> 2024-05-16 12:37:34 info [FUN]:   normal sample name: Test_N
[1] Reading Tumor LogR data...
[1] Reading Tumor BAF data...
[1] Reading Germline LogR data...
[1] Reading Germline BAF data...
[1] Registering SNP locations...
[1] Splitting genome in distinct chunks...
<gcap> 2024-05-16 15:29:48 fatal [value[[3L]]]: job S116655 failed in ASCAT due to following error
<gcap> 2024-05-16 15:29:48 info [value[[3L]]]: length(ovl) > nrow(ASCATobj$Tumor_LogR)/10 is not TRUE
<gcap> 2024-05-16 15:29:48 info [value[[3L]]]: =====
<gcap> 2024-05-16 15:29:48 info [value[[3L]]]: Please check your input bam files (if missing bam index? if its alignment quality is lower?)
<gcap> 2024-05-16 15:29:48 info [value[[3L]]]: =====
<gcap> 2024-05-16 15:29:48 info [gcap.runASCAT]: ASCAT analysis done, check /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/output for results
<gcap> 2024-05-16 15:29:48 info [gcap.workflow]: checking ASCAT result files
<gcap> 2024-05-16 15:29:48 warn [FUN]: result file /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/output/S116655.ASCAT.rds does not exist, the corresponding ASCAT calling has error occurred
<gcap> 2024-05-16 15:29:48 warn [FUN]: /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/output/S116655.ASCAT.rds contains a failed ASCAT job, will discard it before next step
<gcap> 2024-05-16 15:29:48 fatal [gcap.workflow]: no sucessful ASCAT result file to proceed!
<gcap> 2024-05-16 15:29:48 fatal [gcap.workflow]: check your ASCAT setting before make sure this case could not be used!
Error in gcap.workflow(tumourseqfile = "/mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/116655-072-R-AK7A15E12-WES.bwa.final.bam",  :

In addition: Warning message:
One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
Execution halted

Session info

> devtools::session_info()
─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.2.2 (2022-10-31)
 os       Oracle Linux Server 8.9
 system   x86_64, linux-gnu
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/New_York
 date     2024-05-16
 pandoc   2.0.6 @ /usr/bin/pandoc

─ Packages ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 package          * version   date (UTC) lib source
 ASCAT            * 3.0.0     2024-05-15 [2] Github (ShixiangWang/ascat@51fd695)
 Biobase          * 2.58.0    2022-11-01 [2] Bioconductor
 BiocGenerics     * 0.44.0    2022-11-01 [2] Bioconductor
 bitops             1.0-7     2021-04-24 [2] CRAN (R 4.2.2)
 cachem             1.0.8     2023-05-01 [2] CRAN (R 4.2.2)
 cli                3.6.2     2023-12-11 [2] CRAN (R 4.2.2)
 cluster            2.1.6     2023-12-01 [2] CRAN (R 4.2.2)
 codetools          0.2-20    2024-03-31 [2] CRAN (R 4.2.2)
 colorspace         2.1-0     2023-01-23 [2] CRAN (R 4.2.2)
 crayon             1.5.2     2022-09-29 [2] CRAN (R 4.2.2)
 data.table         1.15.4    2024-03-30 [2] CRAN (R 4.2.2)
 devtools           2.4.5     2022-10-11 [2] CRAN (R 4.2.2)
 digest             0.6.35    2024-03-11 [2] CRAN (R 4.2.2)
 doParallel       * 1.0.17    2022-02-07 [2] CRAN (R 4.2.2)
 dplyr              1.1.4     2023-11-17 [2] CRAN (R 4.2.2)
 ellipsis           0.3.2     2021-04-29 [2] CRAN (R 4.2.2)
 fansi              1.0.6     2023-12-08 [2] CRAN (R 4.2.2)
 fastmap            1.2.0     2024-05-15 [2] CRAN (R 4.2.2)
 foreach          * 1.5.2     2022-02-02 [2] CRAN (R 4.2.2)
 fs                 1.6.4     2024-04-25 [2] CRAN (R 4.2.2)
 furrr              0.3.1     2022-08-15 [2] CRAN (R 4.2.2)
 future             1.33.2    2024-03-26 [2] CRAN (R 4.2.2)
 gcap             * 1.2.0     2024-05-16 [2] Github (ShixiangWang/gcap@958a135)
 generics           0.1.3     2022-07-05 [2] CRAN (R 4.2.2)
 GenomeInfoDb     * 1.34.9    2023-02-02 [2] Bioconductor
 GenomeInfoDbData   1.2.9     2024-05-15 [2] Bioconductor
 GenomicRanges    * 1.50.2    2022-12-16 [2] Bioconductor
 GetoptLong         1.0.5     2020-12-15 [2] CRAN (R 4.2.2)
 ggplot2            3.5.1     2024-04-23 [2] CRAN (R 4.2.2)
 GlobalOptions      0.1.2     2020-06-10 [2] CRAN (R 4.2.2)
 globals            0.16.3    2024-03-08 [2] CRAN (R 4.2.2)
 glue               1.7.0     2024-01-09 [2] CRAN (R 4.2.2)
 gridBase           0.4-7     2014-02-24 [2] CRAN (R 4.2.2)
 gtable             0.3.5     2024-04-22 [2] CRAN (R 4.2.2)
 hms                1.1.3     2023-03-21 [2] CRAN (R 4.2.2)
 htmltools          0.5.8.1   2024-04-04 [2] CRAN (R 4.2.2)
 htmlwidgets        1.6.4     2023-12-06 [2] CRAN (R 4.2.2)
 httpuv             1.6.15    2024-03-26 [2] CRAN (R 4.2.2)
 IRanges          * 2.32.0    2022-11-01 [2] Bioconductor
 iterators        * 1.0.14    2022-02-05 [2] CRAN (R 4.2.2)
 jsonlite           1.8.8     2023-12-04 [2] CRAN (R 4.2.2)
 later              1.3.2     2023-12-06 [2] CRAN (R 4.2.2)
 lattice            0.22-6    2024-03-20 [2] CRAN (R 4.2.2)
 lgr                0.4.4     2022-09-05 [2] CRAN (R 4.2.2)
 lifecycle          1.0.4     2023-11-07 [2] CRAN (R 4.2.2)
 listenv            0.9.1     2024-01-29 [2] CRAN (R 4.2.2)
 magrittr           2.0.3     2022-03-30 [2] CRAN (R 4.2.2)
 Matrix             1.6-5     2024-01-11 [3] CRAN (R 4.2.3)
 memoise            2.0.1     2021-11-26 [2] CRAN (R 4.2.2)
 mime               0.12      2021-09-28 [2] CRAN (R 4.2.2)
 miniUI             0.1.1.1   2018-05-18 [2] CRAN (R 4.2.2)
 munsell            0.5.1     2024-04-01 [2] CRAN (R 4.2.2)
 NMF                0.27      2024-02-08 [2] CRAN (R 4.2.2)
 parallelly         1.37.1    2024-02-29 [2] CRAN (R 4.2.2)
 pillar             1.9.0     2023-03-22 [2] CRAN (R 4.2.2)
 pkgbuild           1.4.4     2024-03-17 [2] CRAN (R 4.2.2)
 pkgconfig          2.0.3     2019-09-22 [2] CRAN (R 4.2.2)
 pkgload            1.3.4     2024-01-16 [2] CRAN (R 4.2.2)
 plyr               1.8.9     2023-10-02 [2] CRAN (R 4.2.2)
 profvis            0.3.8     2023-05-02 [2] CRAN (R 4.2.2)
 promises           1.3.0     2024-04-05 [2] CRAN (R 4.2.2)
 purrr              1.0.2     2023-08-10 [2] CRAN (R 4.2.2)
 quadprog           1.5-8     2019-11-20 [2] CRAN (R 4.2.2)
 R6                 2.5.1     2021-08-19 [2] CRAN (R 4.2.2)
 rappdirs           0.3.3     2021-01-31 [2] CRAN (R 4.2.2)
 RColorBrewer     * 1.1-3     2022-04-03 [2] CRAN (R 4.2.2)
 Rcpp               1.0.12    2024-01-09 [2] CRAN (R 4.2.2)
 RCurl              1.98-1.14 2024-01-09 [2] CRAN (R 4.2.2)
 readr            * 2.1.5     2024-01-10 [2] CRAN (R 4.2.2)
 registry           0.5-1     2019-03-05 [2] CRAN (R 4.2.2)
 remotes            2.5.0     2024-03-17 [2] CRAN (R 4.2.2)
 reshape2           1.4.4     2020-04-09 [2] CRAN (R 4.2.2)
 rjson              0.2.21    2022-01-09 [2] CRAN (R 4.2.2)
 rlang              1.1.3     2024-01-10 [2] CRAN (R 4.2.2)
 rngtools           1.5.2     2021-09-20 [2] CRAN (R 4.2.2)
 S4Vectors        * 0.36.2    2023-02-26 [2] Bioconductor
 scales             1.3.0     2023-11-28 [2] CRAN (R 4.2.2)
 sessioninfo        1.2.2     2021-12-06 [2] CRAN (R 4.2.2)
 shiny              1.8.1.1   2024-04-02 [2] CRAN (R 4.2.2)
 sigminer         * 2.3.1     2024-05-11 [2] CRAN (R 4.2.2)
 stringi            1.8.4     2024-05-06 [2] CRAN (R 4.2.2)
 stringr            1.5.1     2023-11-14 [2] CRAN (R 4.2.2)
 tibble             3.2.1     2023-03-20 [2] CRAN (R 4.2.2)
 tidyselect         1.2.1     2024-03-11 [2] CRAN (R 4.2.2)
 tzdb               0.4.0     2023-05-12 [2] CRAN (R 4.2.2)
 urlchecker         1.0.1     2021-11-30 [2] CRAN (R 4.2.2)
 usethis            2.2.3     2024-02-19 [2] CRAN (R 4.2.2)
 utf8               1.2.4     2023-10-22 [2] CRAN (R 4.2.2)
 uuid               1.2-0     2024-01-14 [2] CRAN (R 4.2.2)
 vctrs              0.6.5     2023-12-01 [2] CRAN (R 4.2.2)
 xgboost            1.7.7.1   2024-01-25 [2] CRAN (R 4.2.2)
 xtable             1.8-4     2019-04-21 [2] CRAN (R 4.2.2)
 XVector            0.38.0    2022-11-01 [2] Bioconductor
 zlibbioc           1.44.0    2022-11-01 [2] Bioconductor
ShixiangWang commented 4 months ago

@tingchiafelix I can reproduce the error with the updated reference files. You used uncorrect reference GC and RT files for hg19, as the file with "update" was used for working with updating ASCAT version, but not the fixed version.

You should use the following two files, I tested them and they are still working with your current run environment.

Please note that I removed the correction files marked with 'update', which does not work for the latest version of ASCAT v3 any more, which should use https://github.com/VanLoo-lab/ascat/tree/master/ReferenceFiles/WES instead.

tingchiafelix commented 2 months ago

Hi Shixiang, I've been running the GCAP for over a hundred samples. I could successfully get the results. However, I found a couple of samples that could not go through the workflow and it seems like there was no "ASCAT result file" generated in step 1 (please see below). I'm including the output folder in the attachment and BAM files. I hope it will help with the investigation. I would appreciate any suggestions.

https://www.dropbox.com/scl/fo/9pf9mkereww28q2yda0md/AAK61fn5wJmihoPxa8AW3s4?rlkey=ix2pyr6nyijr9suk04wxpv3ne&st=9q4jakju&dl=0

Best, TC


Working on 119177~322-R1~PXWJH4KA3~WES
Loading required package: ASCAT
Loading required package: RColorBrewer
Loading required package: splines
Loading required package: readr
Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, aperm, append, as.data.frame, basename, cbind,
    colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
    get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
    Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
    table, tapply, union, unique, unsplit, which.max, which.min

Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following objects are masked from ‘package:base’:

    expand.grid, I, unname

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: parallel
Loading required package: doParallel
Loading required package: foreach
Loading required package: iterators
Loading required package: sigminer
Registered S3 method overwritten by 'sigminer':
  method      from
  print.bytes Rcpp
sigminer version 2.3.1
- Star me at https://github.com/ShixiangWang/sigminer
- Run hello() to see usage and citation.
gcap version 1.2.0
- Project URL at https://github.com/ShixiangWang/gcap

Citation:
    Wang, S., Wu, CY., He, MM. et al. Machine learning-based extrachromosomal DNA identification in 
    large-scale cohorts reveals its clinical implications in cancer. Nat Commun 15, 1515 (2024). https://doi.org/10.1038/s41467-024-45479-6
[1] "119177~322-R1~PXWJH4KA3~WES" "119177~322-R1~J1-CAF~WES"   
 chr [1:2] "119177~322-R1~PXWJH4KA3~WES" "119177~322-R1~J1-CAF~WES"
NULL
[1] "119177~322-R1~PXWJH4KA3~WES"
[1] "119177~322-R1~J1-CAF~WES"
[1] "119177"
<gcap> 2024-07-09 08:18:19 info [gcap.workflow]: =====================
<gcap> 2024-07-09 08:18:19 info [gcap.workflow]:    GCAP WORKFLOW
<gcap> 2024-07-09 08:18:19 info [gcap.workflow]: =====================
<gcap> 2024-07-09 08:18:19 info [gcap.workflow]: 
<gcap> 2024-07-09 08:18:19 info [gcap.workflow]: =====================
<gcap> 2024-07-09 08:18:19 info [gcap.workflow]: Step 1: Run ASCAT 3.0
<gcap> 2024-07-09 08:18:19 info [gcap.workflow]: =====================
<gcap> 2024-07-09 08:18:19 info [gcap.runASCAT]: > Run ASCAT on WES data <
<gcap> 2024-07-09 08:18:19 info [gcap.runASCAT]: 
<gcap> 2024-07-09 08:18:19 info [gcap.runASCAT]: Configs:
<gcap> 2024-07-09 08:18:19 info [gcap.runASCAT]:   result path set to /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/output/119177~322-R1~PXWJH4KA3~WES
<gcap> 2024-07-09 08:18:19 info [gcap.runASCAT]:   allelecounter_exe set to ~/miniconda3/envs/ecDNA/bin/alleleCounter
<gcap> 2024-07-09 08:18:19 info [gcap.runASCAT]:   g1000allelesprefix set to /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/ref/1000G_loci_hg19//1000genomesAlleles2012_chr
<gcap> 2024-07-09 08:18:19 info [gcap.runASCAT]:   g1000lociprefix set to /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/ref/1000G_loci_hg19//1000genomesloci2012chrstring_chr
<gcap> 2024-07-09 08:18:19 info [gcap.runASCAT]:   GCcontentfile set to /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/ref/GC_correction_hg19.txt
<gcap> 2024-07-09 08:18:19 info [gcap.runASCAT]:   replictimingfile set to /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/ref/RT_correction_hg19.txt
<gcap> 2024-07-09 08:18:19 info [gcap.runASCAT]:   nthreads set to 22
<gcap> 2024-07-09 08:18:19 info [gcap.runASCAT]:   minCounts set to 10
<gcap> 2024-07-09 08:18:19 info [gcap.runASCAT]:   BED_file set to NA
<gcap> 2024-07-09 08:18:19 info [gcap.runASCAT]:   probloci_file set to NA
<gcap> 2024-07-09 08:18:19 info [gcap.runASCAT]:   chrom_names set to <1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22>
<gcap> 2024-07-09 08:18:19 info [gcap.runASCAT]:   gender set to <XX>
<gcap> 2024-07-09 08:18:19 info [gcap.runASCAT]:   min_base_qual set to 20
<gcap> 2024-07-09 08:18:19 info [gcap.runASCAT]:   min_map_qual set to 35
<gcap> 2024-07-09 08:18:19 info [gcap.runASCAT]:   penalty set to 70
<gcap> 2024-07-09 08:18:19 info [gcap.runASCAT]:   skip_finished_ASCAT set to TRUE
<gcap> 2024-07-09 08:18:19 info [gcap.runASCAT]: 1 jobs detected
<gcap> 2024-07-09 08:18:19 info [gcap.runASCAT]: No ASCAT job to skip.
<gcap> 2024-07-09 08:18:19 info [FUN]: start submitting job 119177-322-R1-PXWJH4KA3-WES
<gcap> 2024-07-09 08:18:19 info [FUN]:      tumor data file: /mnt/legacy/MoCha-hiseq/legacy/scratch/BW_transfers/processedDATA/119177/20170910/119177~322-R1~PXWJH4KA3~WES/119177~322-R1~PXWJH4KA3~WES.bwa.final.bam
<gcap> 2024-07-09 08:18:19 info [FUN]:     normal data file: /mnt/legacy/MoCha-hiseq/legacy/scratch/BW_transfers/processedDATA/119177/20170910/119177~322-R1~J1-CAF~WES/119177~322-R1~J1-CAF~WES.bwa.final.bam
<gcap> 2024-07-09 08:18:19 info [FUN]:    tumor sample name: 119177-322-R1-PXWJH4KA3-WES
<gcap> 2024-07-09 08:18:19 info [FUN]:   normal sample name: 119177-322-R1-J1-CAF-WES
[1] Reading Tumor LogR data...
[1] Reading Tumor BAF data...
[1] Reading Germline LogR data...
[1] Reading Germline BAF data...
[1] Registering SNP locations...
[1] Splitting genome in distinct chunks...
[1] Sample 119177-322-R1-PXWJH4KA3-WES (1/1)
GC correlation:  25bp 0.061 ; 50bp 0.079 ; 100bp 0.111 ; 200bp 0.161 ; 500bp 0.251 ; 1kb 0.259 ; 2kb 0.244 ; 5kb 0.220 ; 10kb 0.203 ; 20kb 0.189 ; 50kb 0.172 ; 100kb 0.164 ; 200kb 0.158 ; 500kb 0.153 ; 1Mb 0.147 ; 2Mb 0.139 ; 5Mb 0.119 ; 10Mb 0.099 ; 
Short window size:  1kb 
Long window size:  5kb 
Replication timing correlation:  Bg02es 0.095 ; Bj 0.106 ; Gm06990 0.126 ; Gm12801 0.132 ; Gm12812 0.126 ; Gm12813 0.130 ; Gm12878 0.128 ; Helas3 0.117 ; Hepg2 0.134 ; Huvec 0.123 ; Imr90 0.109 ; K562 0.133 ; Mcf7 0.132 ; Nhek 0.131 ; Sknsh 0.136 ; 
Replication dataset:  Sknsh 
[1] Plotting tumor data
[1] Plotting germline data
[1] Sample 119177-322-R1-PXWJH4KA3-WES (1/1)
[1] Sample 119177-322-R1-PXWJH4KA3-WES (1/1)
<gcap> 2024-07-09 11:38:27 info [doTryCatch]: job 119177-322-R1-PXWJH4KA3-WES done
<gcap> 2024-07-09 11:38:27 info [gcap.runASCAT]: ASCAT analysis done, check /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/output/119177~322-R1~PXWJH4KA3~WES for results
<gcap> 2024-07-09 11:38:27 info [gcap.workflow]: checking ASCAT result files
<gcap> 2024-07-09 11:38:27 warn [FUN]: /mnt/MoCha-NGS/active/MoCha-NGS_BW_transfers/ecDNA/output/119177~322-R1~PXWJH4KA3~WES/119177-322-R1-PXWJH4KA3-WES.ASCAT.rds contains a failed ASCAT job, will discard it before next step
<gcap> 2024-07-09 11:38:27 fatal [gcap.workflow]: no sucessful ASCAT result file to proceed!
<gcap> 2024-07-09 11:38:27 fatal [gcap.workflow]: check your ASCAT setting before make sure this case could not be used!
Error in gcap.workflow(tumourseqfile = tumor_bam_path, normalseqfile = normal_bam,  : 

In addition: There were 50 or more warnings (use warnings() to see the first 50)
Execution halted
ShixiangWang commented 2 months ago

Hi @tingchiafelix, it's normal. As ASCAT cannot 100% generate corresponding results. Not an issue of GCAP.

tingchiafelix commented 2 months ago

Hi Shixiang,

Does it mean ASCAT could occasionally have failures when running through the GCAP workflow? What is the potential failure rate in your cohort?

Also, I'm wondering if you have a way to estimate the absolute copy of ecDNA similar to what we usually have in non-circular DNA amplification?

Best, TC

ShixiangWang commented 2 months ago

In my experience, there is average 1-5% failure calling of ASCAT, similar to tool like FACETS. Sequenza is much better on this.

For estimating the absolute copy of ecDNA, WGS or experimental strategies are recommended, as WES cannot provide sufficient information for a structure of an ecDNA, in my view. However, it's really a good and important question, I was wondering if I could modeling the expected gene copy number of the non-circular amplification, so copy number on ecDNA of a gene could be the residule of inferred total copy number and the modeling non-circular amplification copy number. In currently stage, GCAP only report total copy number ASCAT captured and the ecDNA prob/class inferred by model.

tingchiafelix commented 2 months ago

Thank you for this insight and information.

Best, TC