parsboy66 commented 2 years ago

To Chernolab

First of all I would like to take this opportunity and appreciate your effort on providing this package.

This is my first time using Aspli package and I got the error due to get some NANs at the output of the jCounts. below I shared the code which can be helpful to resolve the problem and finding errors if there are any.

gtfFileName <- "gencode.v24.annotation.gtf" genomeTxDb <- makeTxDbFromGFF( gtfFileName,dataSource = "genecode",organism = "Homo sapiens" ) features <- binGenome( genomeTxDb )

I used gencode.v24 which doesn't have pseudo genes. making txd from that doesn't have any error but warning due to na-nan value in metadata column for stop codons.

then i used gbcounts to summarise the read overlaps against features, this step works well also , next step Jcounts and the to get the report I ran gbDUreport which makes error due to having NaN in data frame which is not acceptable by glmFit.default.

code I used:

asd <- jCounts(counts=gbcounts, features=features, minReadLength=50,libType = "SE",strandMode = 0) asd

Differential gene expression and bin usage signal estimation:

gb <- gbDUreport(gbcounts, contrast = c(-1,1)) gb

error= No residual df: setting dispersion to NAError in glmFit.default(y = y$counts, design = design, dispersion = dispersion, : NA dispersions not allowed

then I got back to last step jCounts output and I have seen some nan like this:

chr1.1298679.1299817 0 0 0 0 8 0 0.000000000 NaN

now the questions are::

why NaNs come to the output?
how to overcome this problem?

Thanks in advance

P.S= bam file I used produced by minimap2, from Nanopore data(cDNA sample) Aspli version: 2.0.0 R version= 4.0.3

estepi commented 2 years ago

Dear Iman, thanks for using ASpli and for your post.

Can you please tell us what do you obtain with this command:

countsj(object)

And we will try to figure out asap what happens,

thanks

parsboy66 commented 2 years ago

Thanks for fast replying!!!

ok the output of the countsj(object) is the data frame consist of the 500,000 rows, 10 cols. here I attached the image of the first 10 rows.

Thanks

parsboy66 commented 2 years ago

Thanks for fast replying!!!

ok the output of the countsj(object) is the data frame consist of the 500,000 rows, 10 cols. here I attached the image of the first 10 rows.

Thanks

ok I realised this error comes from having less than 2 replicates in the experiment and this is due to new update of edge R which by defaults use NA in dispersion and needs at least 2 replicates to compute the dispersion. do you have any suggestion to configure the dispersion value manually for edge R when we have just 1 replicate, in ASpli package??

Thanks Iman

chernolab / ASpli

NAN at the output of the jCounts #6

Differential gene expression and bin usage signal estimation: