ZW-xjtlu / exomePeak2

Peak calling and differential methylation for MeRIP-Seq
25 stars 5 forks source link

Run fail at "Identify background feature" step (+ most recent version of package installation?) #42

Closed TaehyungKwon closed 10 months ago

TaehyungKwon commented 11 months ago

Hi,

I have two issues.

First, one is regarding failure during ExomePeak2 run. I am analyzing MeRIP-Seq data of different species using exomePeak2_1.10.0. I successfully ran the program with green monkey data. One of the species is human. I am using the most recent version of the genome assemblies (GCF_009914755.1), but it fails with the following error in "background feature calculation process":

Extract bin features ... OK Count reads on bin features ... OK Identify background features ... Error in if (G[1] == 1) { : missing value where TRUE/FALSE needed Calls: run_exomepeak2 ... classifyBackground -> Mclust -> eval -> eval -> mclustBIC

I was trying to debug it by getting into the code of Mclust package, but I got lost. Could you please guide me from here?

======================================================================================== In the meantime, I tried to install the newest version available in Bioconductor package (3.18) in R 4.3.2. However, some dependencies seem to be not installed in the most recent Bioconductor version. Could you kindly guide me through the installation of exomePeak2 1.14.3?

The following is my installation steps:

if (!require("BiocManager", quietly = TRUE))
        # install.packages("BiocManager")
        # BiocManager::install(version = "3.18")

# install exomePeak2 dependencies
BiocManager::install(c("Rsamtools", "GenomicAlignments", "GenomicRanges", 
                    "GenomicFeatures", "DESeq2", "ggplot2", "mclust", "BSgenome", 
                    "Biostrings", "GenomeInfoDb", "BiocParallel", "IRanges", 
                    "S4Vectors", "rtracklayer", "methods", "stats", 
                    "utils", "BiocGenerics", "magrittr", "speedglm", "splines"),force = TRUE)

# install exomePeak2: https://github.com/ZW-xjtlu/exomePeak2
BiocManager::install("exomePeak2")

Thanks :)

ZW-xjtlu commented 11 months ago

Dear exomePeak2 user,

Regarding the first error, it is a common issue encountered during the unsuccessful execution of exomePeak2. This error, though not readily interpretable, indicates that too few bins (or sliding windows on the exome) have a read count of five or more across all samples. In some cases, there might be none. This situation prevents the classification of any bins as background by the Gaussian mixture model (GMM) applied to bins with counts of five or higher. These background bins are essential for estimating size factors.

In simpler terms, the error essentially means that the BAM files have failed to adequately count overlaps with the genomic features defined in the GTF/TXDB, resulting in an insufficient count matrix for fitting meaningful data analysis models. This error can stem from multiple sources, with a common cause being a mismatch between the gene annotation (e.g. chromosome numbers and coordinates) provided by the GTF/TXDB and the BAM files. It is crucial to ensure that the BAM files have high coverage and align well with the exons outlined in your gene annotation files used by exomePeak2. A practical method for verification is to use IGV (Integrative Genomics Viewer) to visualize the BAM file coverage against the gene annotation you are employing.

Regarding the second issue related to installation, it is improbable that the dependent packages are genuinely unloadable for the latest version of Bioconductor. This is because the exomePeak2 package has successfully passed the build and installation check on the Bioconductor 3.18 server. To resolve this, you should identify which specific dependency package is missing. Once identified, attempt to address the issue by querying with the specific error message encountered when downloading that missing package for your system.

Best wishes, Zhen

TaehyungKwon commented 10 months ago

Thank you so much Zhen. I was tweaking ExomePeak2 parameters as well as alignment & trimming parameters, but in the end, I followed your advice and fixed this issue by (1) updating the RefSeq data and (2) using GFF file instead of GTF file.