PulverCyril / craTEs

Estimate the cis-regulatory activity of transposable element (TEs) subfamilies
MIT License
10 stars 0 forks source link

preprocess_E_N_for_activities - Error in hist.default(log_tpm) : invalid number of 'breaks' #1

Closed VCF1995 closed 2 weeks ago

VCF1995 commented 2 weeks ago

Hi Cyril,

I'm trying to use craTEs on my RNA-seq data, and I am having some issues with the function preprocess_E_N_for_activities() - seems like, when it tries to call internally for hist(), the number of breaks does not seem appropriate. By looking in other pages (https://github.com/hafen/trelliscopejs/issues/55) seems like hist struggles when that value is <1. How can I solve this? Thanks!

> countTable = readRDS(paste0(getwd(),"/Raw_counts.rds")) %>% assay()
> N = craTEs::N_from_tsv('/g/noh/Victor/Projects/mRNAseq_hGNs_50DIV_KCl_timecourse/craTEs/N_weighted_2.5e5.tsv')
> preprocessed = craTEs::preprocess_E_N_for_activities(countTable, N, log_tpm_plot_path = paste0(getwd(),'/craTEs/qc_file.pdf'))
Error in hist.default(log_tpm) : invalid number of 'breaks'
In addition: Warning message:
In dir.create(dirname(log_tpm_plot_path)) :
  '/g/noh/Victor/Projects/mRNAseq_hGNs_50DIV_KCl_timecourse/craTEs' already exists
VCF1995 commented 2 weeks ago

Forgot to add my session info, it is here below:

`> sessionInfo() R version 4.2.0 (2022-04-22) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Rocky Linux 8.8 (Green Obsidian)

Matrix products: default BLAS/LAPACK: /g/easybuild/x86_64/Rocky/8/haswell/software/FlexiBLAS/3.0.4-GCC-11.2.0/lib64/libflexiblas.so.3.0

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods
[8] base

other attached packages: [1] DESeq2_1.38.3 SummarizedExperiment_1.28.0 [3] Biobase_2.58.0 MatrixGenerics_1.10.0
[5] matrixStats_1.2.0 GenomicRanges_1.50.2
[7] GenomeInfoDb_1.34.9 IRanges_2.32.0
[9] S4Vectors_0.36.2 BiocGenerics_0.44.0
[11] forcats_1.0.0 stringr_1.5.1
[13] dplyr_1.1.4 purrr_1.0.2
[15] readr_2.1.5 tidyr_1.3.1
[17] tibble_3.2.1 ggplot2_3.4.4
[19] tidyverse_1.3.1 craTEs_0.0.0.9000

loaded via a namespace (and not attached): [1] httr_1.4.7 bit64_4.5.2 jsonlite_1.8.8
[4] modelr_0.1.11 assertthat_0.2.1 blob_1.2.4
[7] GenomeInfoDbData_1.2.9 cellranger_1.1.0 pillar_1.9.0
[10] RSQLite_2.2.12 backports_1.4.1 lattice_0.20-45
[13] glue_1.8.0 RColorBrewer_1.1-3 XVector_0.38.0
[16] rvest_1.0.3 colorspace_2.0-3 Matrix_1.4-1
[19] XML_3.99-0.9 pkgconfig_2.0.3 broom_0.8.0
[22] haven_2.5.4 zlibbioc_1.44.0 xtable_1.8-4
[25] scales_1.2.0 tzdb_0.3.0 BiocParallel_1.30.0
[28] timechange_0.3.0 annotate_1.74.0 KEGGREST_1.36.0
[31] generics_0.1.3 cachem_1.0.6 withr_2.5.0
[34] cli_3.6.2 magrittr_2.0.3 crayon_1.5.1
[37] readxl_1.4.3 memoise_2.0.1 fs_1.5.2
[40] fansi_1.0.3 xml2_1.3.3 data.table_1.16.2
[43] tools_4.2.0 hms_1.1.3 lifecycle_1.0.4
[46] locfit_1.5-9.5 munsell_0.5.0 reprex_2.1.0
[49] DelayedArray_0.24.0 AnnotationDbi_1.58.0 Biostrings_2.64.0
[52] compiler_4.2.0 rlang_1.1.3 grid_4.2.0
[55] RCurl_1.98-1.16 rstudioapi_0.15.0 bitops_1.0-7
[58] gtable_0.3.0 DBI_1.2.3.9014 R6_2.5.1
[61] lubridate_1.9.3 fastmap_1.1.0 bit_4.5.0
[64] utf8_1.2.2 stringi_1.8.4 parallel_4.2.0
[67] Rcpp_1.0.13 png_0.1-7 vctrs_0.6.5
[70] geneplotter_1.76.0 dbplyr_2.1.1 tidyselect_1.2.0 `

PulverCyril commented 2 weeks ago

Hello,

This error is usually caused by zero overlap between the rownames (genes) in N and the rownames (genes) in your expression matrix. Are you using ENSEMBL gene ID? If so, do they include the ensembl gene version (.VERSION_NUMBER, they shouldn't)?

The easiest way for me to help you would be for you to send me a header on what you're using as a count table by doing:

readRDS(paste0(getwd(),"/Raw_counts.rds")) %>% assay() %>% head()

Hope it helps,

Best,

Cyril


De : VCF1995 @.***> Envoyé : jeudi, 31 octobre 2024 11:38:17 À : PulverCyril/craTEs Cc : Subscribed Objet : [PulverCyril/craTEs] preprocess_E_N_for_activities - Error in hist.default(log_tpm) : invalid number of 'breaks' (Issue #1)

Hi Cyril,

I'm trying to use craTEs on my RNA-seq data, and I am having some issues with the function preprocess_E_N_for_activities() - seems like, when it tries to call internally for hist(), the number of breaks does not seem appropriate. By looking in other pages (hafen/trelliscopejs#55https://github.com/hafen/trelliscopejs/issues/55) seems like hist struggles when that value is <1. How can I solve this? Thanks!

countTable = readRDS(paste0(getwd(),"/Raw_counts.rds")) %>% assay() N = craTEs::N_from_tsv('/g/noh/Victor/Projects/mRNAseq_hGNs_50DIV_KCl_timecourse/craTEs/N_weighted_2.5e5.tsv') preprocessed = craTEs::preprocess_E_N_for_activities(countTable, N, log_tpm_plot_path = paste0(getwd(),'/craTEs/qc_file.pdf')) Error in hist.default(log_tpm) : invalid number of 'breaks' In addition: Warning message: In dir.create(dirname(log_tpm_plot_path)) : '/g/noh/Victor/Projects/mRNAseq_hGNs_50DIV_KCl_timecourse/craTEs' already exists

— Reply to this email directly, view it on GitHubhttps://github.com/PulverCyril/craTEs/issues/1, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKQD4GIKTRS4ZG43ZH5C3QLZ6ICBTAVCNFSM6AAAAABQ577S62VHI2DSMVQWIX3LMV43ASLTON2WKOZSGYZDMNJSGQ4TOMY. You are receiving this because you are subscribed to this thread.Message ID: @.***>

VCF1995 commented 2 weeks ago

Hi Cyril,

I was using the ENSEMBL gene ID with the gene version (.VERSION_NUMBER). I removed the extension and now it works with the regular ENSEMBL ID. Thanks!