Open levuBGU opened 11 months ago
Dear exomePeak2 User,
Thank you for reaching out regarding the unexpected findings in your RNA sample analysis using exomePeak2. Your observation of many methylated regions corresponding to introns, despite not specifying the "exon" mode, is intriguing. The root of this issue likely stems from the gene annotation file discrepancies.
You have used "ncbiRefSeq.gtf" for gene annotation, whereas the UCSC Genome Browser primarily utilizes its UCSC Known Genes dataset. It's important to understand that gene annotations in the NCBI RefSeq database and the UCSC Known Genes are not identical, though they share similarities. This difference can potentially influence your analysis outcomes.
Here are some key distinctions between these two annotation sources:
Source and Methods: RefSeq, managed by the NCBI, offers a well-integrated set of sequences, including DNA, transcripts, and proteins, sourced from various databases and curated by the NCBI team. Conversely, UCSC Known Genes are compiled by aligning protein and mRNA sequences to the human genome, deriving data from RefSeq, GenBank, UniProt, and the ENCODE project.
Annotation Philosophy: RefSeq aims to present a singular, high-quality representative sequence for each gene, which might limit the representation of alternative splicing variants. In contrast, UCSC Known Genes encompass a wider array of splice variants, which might lead to more gene and transcript models, particularly in genes with complex splicing patterns.
Update Frequency and Versioning: Both databases undergo regular updates, but the timing and methods may differ, leading to potential discrepancies at any given moment.
Organism Specificity: While RefSeq encompasses a broad range of organisms, UCSC Known Genes primarily focus on the human genome, along with other model organisms.
These differences underline why your results may not align with the expected "exon" mode defaults. For a more comprehensive understanding, cross-referencing both RefSeq and UCSC Known Genes is advisable.
Should you have further questions or require additional assistance, please do not hesitate to contact me.
Best Regards, Zhen Wei
Hi Zhen,
Thank you for the detailed explanation. I've found a path on the UCSC website leading to two annotation files: https://hgdownload.soe.ucsc.edu/goldenPath/rn7/bigZips/genes/.
Given that UCSC annotations are sourced from multiple databases, is there a straightforward method to perform cross-referencing or to utilize UCSC annotations directly in exomePeak2, possibly through a specific argument or setting?
Thanks again for your assistance.
Best regards, Uri Levy
Hi,
I utilized exomePeak2 for peak detection in RNA samples and observed that many methylated regions corresponded to introns. This finding surprised me since I did not specify the mode argument, and according to my understanding, the default is set to "exon."
The workflow involved taking the exomePeak2 bed file output, converting it into a bigWig file, uploading it to the UCSC genome browser custom track, and subsequently discovering the presence of introns.
For this analysis, I used paired-end BAM files as input and an NCBI GTF file (ncbiRefSeq.gtf for Rattus norvegicus). The specified arguments in the exomePeak2 function were as follows:
I am also adding the text from my log file when executing the R code:
I would greatly appreciate any suggestions or insights.