AAlhendi1707 / countToFPKM

Convert Counts to Fragments per Kilobase of Transcript per Million (FPKM)
GNU General Public License v3.0
61 stars 15 forks source link

The fpkm(counts, featureLength, meanFragmentLength) did not return anything #6

Closed excel9 closed 3 years ago

excel9 commented 4 years ago

Hi AAlhendi1707,

I created the gene.annotations file (mouse ensemble mm10) with filtered and re-ordered gene.annotations to match the order in counts matrix and then ran this code (below) for FPKM. Unfortunately the fpkm_matrix output was NA all through.

library(countToFPKM)

Import the read count matrix data into R.

counts <- read.csv(file = 'normalized_count.csv', header = TRUE) rownames(counts) <- counts[, 1] counts <- counts[, -1]

Import feature annotations and Assign feature length into a numeric vector.

gene.annotations <- read.csv("gene.annotations.csv", header=TRUE) featureLength <- gene.annotations$length

Import sample metrics and Assign mean fragment length into a numeric vector.

samples.metrics <- read.delim ("RNAseq.samples.metrics.txt", sep="\t", header=TRUE) meanFragmentLength <- samples.metrics$meanFragmentLength

Return FPKM into a numeric matrix.

fpkm_matrix <- fpkm(as.matrix(counts), featureLength, meanFragmentLength)

I am also uploading the counts and gene.annotations file (originally as .csv file, but uploaded here as .txt which is supported by github) along with the samples.metrics.txt. gene.annotations.txt normalized_count.txt RNAseq.samples.metrics.txt

Please help me out in this!

Thanks, excel9

AAlhendi1707 commented 4 years ago

Dear Shayoni,

This happens because one of your inputs is either contained NA or it is NOT in numeric format. Please be sure that you check the following, before you run fpkm():

1- counts, featureLength, meanFragmentLength are free of NA. 2- counts should be a numeric matrix, featureLength and meanFragmentLength are numeric vectors.

excel9 commented 4 years ago

Thank you Ahmed! It worked now after I removed the "NA" columns. I had a quick question, why did I get FPKM values of 17462 genes (74 genes lower) than the annotation file with 17536 genes.

Thank you so much for your help!

AAlhendi1707 commented 3 years ago

Hi there

For accurate quantification of FPKM of RNA-Seq data, the read counts need to be normalised by feature effective length Lee et al. 2011 paper. To compute the effective length, the meanFragmentLength will be deducted from the feature length. Thus, the features lengthened less than the meanFragmentLength will be automatically dropped off. In other word, you cannot calculate the fpkm for features smaller than the meanFragmentLength, and that is why your fpkm_matrix is shorter than counts.

To get stats about the genes that drop off due to featureLength < meanFragmentLength Please try to use the latest version from Github

if(!require(devtools)) install.packages("devtools") devtools::install_github("AAlhendi1707/countToFPKM", build_vignettes = TRUE)

Hope it helps! A