counts of rows of fpkm_matrix inconsistent with that of counts_matrix

yusukesano46 commented 4 years ago

Hello,

I performed below command. Why counts of rows of "fpkm_matrix" inconsistent with that of "counts"?

================ library(countToFPKM) counts <- read.delim("XXX.txt", header=T, sep="\t",row.names=1) #read counts were calculated by htseq-counts. annotation file was "gencode.v22.annotation.gtf"

gene.annotations <- read.table("featurelength.txt", sep="\t", header=TRUE) #featurelength were calculated by "GenomicFeatures". annotation file was "gencode.v22.annotation.gtf" featureLength <- gene.annotations$featurelength

samples.metrics <- read.table("meanFragmentLength_adapter.txt", sep="\t", header=TRUE) #meanFragmentLength were calculated by Picard meanFragmentLength <- samples.metrics$meanFragmentLength

fpkm_matrix <- fpkm (counts, featureLength, meanFragmentLength)

nrow(counts) [1] 60483

nrow(gene.annotations) [1] 60483

nrow(fpkm_matrix) [1] 42954

================= Why are these results ("nrow(counts)" and "nrow(fpkm_matrix)") not consistent?

AAlhendi1707 commented 4 years ago

Hi there

Thanks for reporting this issue.

For accurate quantification of FPKM of RNA-Seq data, the read counts need to be normalised by feature effective length Lee et al. 2011 paper. To compute the effective length, the meanFragmentLength will be deducted from the feature length. Thus, the features lengthened less than the meanFragmentLength will be automatically dropped off. In other word, you cannot calculate the fpkm for features smaller than the meanFragmentLength, and that is why your fpkm_matrix is shorter than counts.

To get stats about the genes that drop off due to featureLength < meanFragmentLength Please try to use the latest version from Github

if(!require(devtools)) install.packages("devtools")
devtools::install_github("AAlhendi1707/countToFPKM", build_vignettes = TRUE)

Hope it helps! A

yusukesano46 commented 4 years ago

Dear Ahmed Alhendi,

Many thanks for your reply, this has now all been done for you. I understand.

2020年7月19日(日) 17:22 Ahmed Alhendi notifications@github.com:

Hi there

Thanks for reporting this issue.

For accurate quantification of FPKM of RNA-Seq data, the read counts need to be normalised by feature effective length Lee et al. 2011 paper https://academic.oup.com/nar/article/39/2/e9/2409022. To compute the effective length, the meanFragmentLength will be deducted from the feature length. Thus, the features lengthened less than the meanFragmentLength will be automatically dropped off. You cannot count the fpkm for features smaller than the meanFragmentLength, and that is why your fpkm_matrix is shorter than counts.

I'll make sure that the verision 1.2 of countToFPKM will return summary of features that fpkm() cannot return the fpkm value due to meanFragmentLength < featureLength

Hope it helps!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/AAlhendi1707/countToFPKM/issues/7#issuecomment-660607977, or unsubscribe https://github.com/notifications/unsubscribe-auth/APF7FYNAGGPPFSGGW5Z3W23R4KUNTANCNFSM4PA67YMQ .

Golden-proteogenomics commented 3 years ago

Hello， there is a question for me to understand for the countToFPKM,which is what is meanFragmentLength？It was ued in example code. So, could you give more details description what is that or how got that？ sincerely hope your reply. Thanks！

AAlhendi1707 commented 3 years ago

Hello， there is a question for me to understand for the countToFPKM,which is what is meanFragmentLength？It was ued in example code. So, could you give more details description what is that or how got that？ sincerely hope your reply. Thanks！

Hi there,

Please find the answer in the below link https://github.com/AAlhendi1707/countToFPKM/issues/1

kind regards A

AAlhendi1707 / countToFPKM

counts of rows of fpkm_matrix inconsistent with that of counts_matrix #7