GavinHaLab / ichorCNA

Estimating tumor fraction in cell-free DNA from ultra-low-pass whole genome sequencing.
GNU General Public License v3.0
12 stars 10 forks source link

GenomeInfoDb #11

Open teacakedeadlift opened 1 year ago

teacakedeadlift commented 1 year ago

IchorCNA suddenly decided it was going to error with the following:

Error in .order_seqlevels(chrom_sizes[, "chrom"]) : 
  !anyNA(m32) is not TRUE
Calls: getSeqInfo ... FETCH_ORDERED_CHROM_SIZES -> .order_seqlevels -> stopifnot
Execution halted

Likely as a result of the issues with GenomeInfoDb.

Just wondering if anyone else has problems or if there is a workaround?

Old conda environments broken, new ones won't download GenomeInfoDB. Doesn't seem to want to work with the 0.5.0 docker image either (which I assumed wouldn't require downloads from bioconductor). I'm at a loss as to what to do.

Any help appreciated

Phil

ekushele commented 1 year ago

Also posted this here: https://github.com/broadinstitute/ichorCNA/issues/120#issue-1602725193

gavinha commented 1 year ago

Hi @teacakedeadlift

Let us look into this.

Can you please post your sessionInfo() so that we can replicate it?

Thanks, Gavin

teacakedeadlift commented 1 year ago

hi @gavinha

Sorry for the late reply.

I couldn't run sessionInfo() as I was running ichorCNA either via the snakemake pipeline inside a conda env or just by execution of the Rscript unIchorCNA.R within the conda env. Turns out the snakemake pipeline was running to completion but with errors in the log files, and the conda env itself was having issues with all of the following packages bioconductor-genomeinfodb bioconductor-genomeinfodbdata bioconductor-bsgenome.hsapiens.ucsc.hg19 bioconductor-bsgenome.hsapiens.ucsc.hg38. The issue is with these packages not ichorCNA. Various workarounds haven't yet succeeded for me (including clean conda env installs, using mamba, manual alteration of the build number as per here. I seem to be able to install locally within R using BiocManager::install("GenomeInfoDb") but have yet to trial using ichorCNA within R.

I have managed to get the docker container from https://quay.io/repository/biocontainers/r-ichorcna?tab=tags&tag= working with apptainer on our cluster. Code included here in case anyone is in a similar position:

Using apptainer pull docker://quay.io/biocontainers/r-ichorcna:0.5.0--pl5321r42hdfd78af_0, renaming to r-ichorcna_0.5.0.sif and then submitting a job with the following from within the scripts folder of ichorCNA (wigs already created):

cd /data/home/hfx252/tools/tmp/ichorCNA/scripts

patientID=patientID
sampleID=sampleID
controlID=$patientID.control
mkdir -p ./$JOB_ID/$patientID
mkdir -p ./$JOB_ID/$patientID/logs

apptainer exec /path/to/sif/r-ichorcna_0.5.0.sif Rscript runIchorCNA.R \
 --libdir ./.. \
 --WIG /path/to/wigs/${patientID}.${sampleID}.wig \
 --NORMWIG /path/to/wigs/${controlID}.wig \
 --normalPanel ../inst/extdata/HD_ULP_PoN_hg38_1Mb_median_normAutosome_median.rds \
 --id $patientID.$sampleID \
 --libdir ../ \
 --gcWig ../inst/extdata/gc_hg38_1000kb.wig \
 --mapWig ../inst/extdata/map_hg38_1000kb.wig \
 --centromere ../inst/extdata/GRCh38.GCA_000001405.2_centromere_acen.txt \
 --genomeBuild hg38 \
 --genomeStyle UCSC \
 --minMapScore 0.7 \
 --ploidy "c(2)" --normal "c(.95,.99,.999)" --maxCN 5 \
 --includeHOMD False \
 --chrs "paste0('chr', c(1:22, \"X\"))" \
 --chrTrain "paste0('chr', c(1:22, \"X\"))" \
 --chrNormalize "paste0('chr', c(1:22))" \
 --estimateNormal True --estimatePloidy True --estimateScPrevalence True --scStates "c(1,3)" \
 --exons.bed NULL \
 --txnE 0.99999 --txnStrength 100000 \
 --fracReadsInChrYForMale 0.001 \
 --maxFracGenomeSubclone 0.5 --maxFracCNASubclone 0.7 \
 --plotFileType pdf --plotYLim "c(-2,4)" \
 --outDir ./$JOB_ID/$patientID > ./$JOB_ID/$patientID/logs/$patientID.$sampleID.log 2> ./$JOB_ID/$patientID/logs/$patientID.$sampleID.log

So far seems to be working although having some issues with:

  1. chr19 being reduced to a couple fo bins around the centromere by the provided hg38 PoN
  2. spurious bins in some chromosomes that are filtered out when we use the provided hg38 PoN, but not when we have created our own (~30 samples)
  3. TFx called as >0.03 in some samples when using 500kb bins but then these disappearing when using 1000kb.

Unsure if the first 2 are due to mappability filtering although the mapwig for chr19 looks like it includes most of chromosome and our samples have coverage in chr19. Currently trialling runs with hg19 in case that is an issue. The last issue might be something to do with the most likely solution being selected, and the starting normal contamination settings / ploidy settings (--estimateScPrevalence usually set to FALSE).

Do you run ichorCNA from within R, on the command line, locally or on a cluster? What would you suggest is best practice?

many thanks

Phil

ekushele commented 1 year ago

Hi @teacakedeadlift

Let us look into this.

Can you please post your sessionInfo() so that we can replicate it?

Thanks, Gavin

@gavinha, here is my sessionInfo for singularity image: r-ichorcna-0.3.2--pl5321r42hdfd78af_2:

R version 4.2.2 (2022-10-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)

Matrix products: default
BLAS/LAPACK: /usr/local/lib/libopenblasp-r0.3.21.so

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] ichorCNA_0.3.2       GenomicRanges_1.50.0 GenomeInfoDb_1.34.1
[4] IRanges_2.32.0       S4Vectors_0.36.0     BiocGenerics_0.44.0
[7] HMMcopy_1.40.0       data.table_1.14.4    optparse_1.7.3

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.9             bitops_1.0-7           plyr_1.8.7
 [4] zlibbioc_1.44.0        XVector_0.38.0         getopt_1.20.3
 [7] tools_4.2.2            RCurl_1.98-1.9         compiler_4.2.2
[10] GenomeInfoDbData_1.2.9

I tried to run newer version with singularity image r-ichorcna:0.5.0--pl5321r42hdfd78af_0 (with ichorCNA as function) but I also have issues there: https://github.com/GavinHaLab/ichorCNA/issues/12#issue-1623124087

ycl6 commented 11 months ago

Hi all,

The error is due to a bug in GenomeInfoDb when handling hg38 assembly, see https://github.com/Bioconductor/GenomeInfoDb/issues/86#issuecomment-1416528235.

Error in .order_seqlevels(chrom_sizes[, "chrom"]) :
!anyNA(m32) is not TRUE

The fixed version (v1.34.8) is available in BioC 3.16 (R-4.2). Hence, if you are not using R-4.2, you'll need to upgrade R and install ichorCNA with the latest versions of the all the R dependencies. If you are already using R-4.2, you can upgrade existing packages by update.packages() and BiocManager::install().

These are the latest version of package imported by ichorCNA in R-4.2:

  HMMcopy (>= 1.40),
  GenomicRanges (>= 1.50.2),
  GenomeInfoDb (>= 1.34.9),
  doMC (>= 1.3.8),
  foreach (>= 1.5.2),
  BSgenome.Hsapiens.UCSC.hg19 (>= 1.4.3),
  BSgenome.Hsapiens.UCSC.hg38 (>= 1.4.5),
  ggplot2 (>= 3.4.3),
  stringr (>= 1.5.0)