chernolab / ASpli

BioC current release of ASpli
4 stars 1 forks source link

Error in gbCounts and using the matrices in external tools #5

Closed hkarakurt closed 2 years ago

hkarakurt commented 2 years ago

Hello, Thank you for this great tool. I have some question. I use MM39 genome and annotation GTF from Ensembl. When I do not use "Gene Symbols" in binGenome function everything looks normal but when I use GeneSymbols via the code:

`library(ASpli) library(GenomicFeatures) library(biomaRt)

genomeTxDb <- makeTxDbFromGFF("GRCm39.105.gtf" , format = "gtf" , organism = "Mus musculus" , taxonomyId = 10090) genes <- genes(genomeTxDb) gene_ids <- genes$gene_id

ensembl <- useMart("ensembl", dataset="mmusculus_gene_ensembl")

gs_heatdata <- getBM(attributes = c('external_gene_name','ensembl_gene_id'),filters = 'ensembl_gene_id',values = gene_ids,mart = ensembl)

symbols <- as.data.frame(gs_heatdata$external_gene_name) rownames(symbols) <- gs_heatdata$ensembl_gene_id colnames(symbols) <- c("symbol") features <- binGenome(genomeTxDb , geneSymbols = symbols , cores = 10)`

I have an error in gbCounts step. Actually nothing stopped it is working but it says:

Summarizing Sample1 Error in (function (x) : attempt to apply non-function ETA: 63 min

and it keeps working. What may be the reason of this and how it affects the analyses?

Second, I have a time series data sets and multi-condition data sets. I want to do correlation analyses. As I know ASPli does not do these kind of analyses. I extracted gene counts, bin counts, irPIR, altPSI, junctionsPSI and junctionsPIR matrices (and removed NA values). Is it possible to use these count matrices (especially bin counts) in another tool such as DESeq2, limma or for correlation analyses using corr() function. As I know they are count matrices similar to gene expression matrices from FeatureCounts so I thought it is possible to apply basic analyses such as PCA, clustering or correlations on them. Which normalization method (such as library size normalization in DESeq2) do you think it would be better.

And lastly, @estepi mentioned that she adds symbols later. Which package or method do you use to add gene symbols later and to which variable of ASPli do you add them?

Thank you in advance.

estepi commented 2 years ago

Hi, thanks a lot for using ASpli.

Regarding symbol names, I should check the example. I always prefer Ensembl ID as gene ID because they are not repeated and I add symbol later. Sometimes they are repeated or contain weird characters like "-" or greek symbols that can disrupt the pipeline.

You can do it on features (you can access to data.frames using this accessor: features@genes, etc...) or you can add the symbols at the end (on the final tables)... Let me know if you need further explanation

Rgarding the matrix, of course you can use fot your custom analysis. This is one of our remarkably feature... intermediate results are easily accesible. You can do correlations (maybe scaling by row), PCA, itme series analysis, whatever you want... In order to test differential splicing, we developed a powetful statistical framework in ASpli which consider normalization and correction so it is the best option I can recommend.

thanks,