alyssafrazee / ballgown

Bioconductor package "ballgown", devel version. Isoform-level differential expression analysis in R.
http://biorxiv.org/content/biorxiv/early/2014/09/05/003665.full.pdf
142 stars 58 forks source link

Table of differentially expressed transcript between sexes replace feature with Transcript name column #114

Closed mictadlo closed 7 years ago

mictadlo commented 7 years ago

Hi, How to replace feature with Transcript name column in the table of differentially expressed transcript between sexes rsimilar to Table3 from the new Tuxedo protocol.

This is the code which I used to generate the table:

    #!/usr/bin/env Rscript

    pheno_data_file <- "pheno.txt"

    library(ballgown)
    library(RSkittleBrewer)
    library(genefilter)
    library(dplyr)
    library(devtools)

    pheno_data <- read.table(pheno_data_file, header=TRUE, colClasses = rep("character", 3))
    pheno_data = pheno_data[order(pheno_data$ids),]
    bg <- ballgown(dataDir = ".", samplePattern="ENIN", pData=pheno_data)

    bg_filt = subset(bg,"rowVars(texpr(bg)) >1",genomesubset=TRUE)

    results_transcripts = stattest(bg_filt, feature="transcript",covariate="Tissue_type",
                      adjustvars = c("RWC"), getFC=TRUE, meas="FPKM")

    results_genes = stattest(bg_filt, feature="gene", covariate="Tissue_type", 
                adjustvars = c("RWC"), getFC=TRUE, meas="FPKM")

    results_transcripts = data.frame(geneNames=ballgown::geneNames(bg_filt), 
                      geneIDs=ballgown::geneIDs(bg_filt), results_transcripts)                       

    results_transcripts <- arrange(results_transcripts, pval)
    results_genes <-  arrange(results_genes, pval)

    write.csv(results_transcripts, "transcripts_results.csv", row.names=FALSE)
    write.csv(results_genes, "genes_results.csv", row.names=FALSE) 

the annotation file looks like this

> head transdecoder.gff3 
comp100004_c0_seq1  transdecoder    gene    1   304 .   +   .   ID=comp100004_c0::comp100004_c0_seq1::g.108505;Name=ORF%20type%3Ainternal%20len%3A102%20%28%2B%29
comp100004_c0_seq1  transdecoder    mRNA    1   304 .   +   .   ID=comp100004_c0::comp100004_c0_seq1::g.108505::m.108505;Parent=comp100004_c0::comp100004_c0_seq1::g.108505;Name=ORF%20type%3Ainternal%20len%3A102%20%28%2B%29
comp100004_c0_seq1  transdecoder    exon    1   304 .   +   .   ID=comp100004_c0::comp100004_c0_seq1::g.108505::m.108505.exon1;Parent=comp100004_c0::comp100004_c0_seq1::g.108505::m.108505
comp100004_c0_seq1  transdecoder    CDS 1   303 .   +   0   ID=cds.comp100004_c0::comp100004_c0_seq1::g.108505::m.108505;Parent=comp100004_c0::comp100004_c0_seq1::g.108505::m.108505
comp100004_c0_seq1  transdecoder    three_prime_UTR 304 304 .   +   .   ID=comp100004_c0::comp100004_c0_seq1::g.108505::m.108505.utr3p1;Parent=comp100004_c0::comp100004_c0_seq1::g.108505::m.108505

comp100010_c0_seq1  transdecoder    gene    1   409 .   -   .   ID=comp100010_c0::comp100010_c0_seq1::g.108507;Name=ORF%20type%3Ainternal%20len%3A137%20%28-%29
comp100010_c0_seq1  transdecoder    mRNA    1   409 .   -   .   ID=comp100010_c0::comp100010_c0_seq1::g.108507::m.108507;Parent=comp100010_c0::comp100010_c0_seq1::g.108507;Name=ORF%20type%3Ainternal%20len%3A137%20%28-%29
comp100010_c0_seq1  transdecoder    five_prime_UTR  409 409 .   -   .   ID=comp100010_c0::comp100010_c0_seq1::g.108507::m.108507.utr5p1;Parent=comp100010_c0::comp100010_c0_seq1::g.108507::m.108507
comp100010_c0_seq1  transdecoder    exon    1   409 .   -   .   ID=comp100010_c0::comp100010_c0_seq1::g.108507::m.108507.exon1;Parent=comp100010_c0::comp100010_c0_seq1::g.108507::m.108507

What did I miss?

Thank you in advance.

Michal

mictadlo commented 7 years ago

I have forgotten to add the produced tables and I just wondering how to add Transcript names to the below table:

> head transcripts_results.csv 
"geneNames","geneIDs","feature","id","fc","pval","qval"
"ORF%20type%3A5prime_partial%20len%3A201%20%28-%29","MSTRG.58114","transcript","64131",0.14226907572396,8.55491455453716e-10,5.28120540195243e-05
".","MSTRG.11120","transcript","11545",0.143960965949389,1.70715367397989e-08,0.000360886202201627
"ORF%20type%3A5prime_partial%20len%3A241%20%28-%29","MSTRG.73927","transcript","86333",0.0761516786426747,1.9375042459302e-08,0.000360886202201627
"ORF%20type%3Acomplete%20len%3A172%20%28%2B%29","MSTRG.77795","transcript","92023",0.17147507098578,2.41284781044016e-08,0.000360886202201627
".","MSTRG.105613","transcript","134605",0.142012000216645,2.92296018500338e-08,0.000360886202201627
".","MSTRG.33056","transcript","34634",0.228098393534398,4.23230432922139e-08,0.00043545473859304
"ORF%20type%3A5prime_partial%20len%3A127%20%28-%29","MSTRG.47891","transcript","51195",0.198830171922629,5.52925707442142e-08,0.00043774608545144
".","MSTRG.23882","transcript","24942",0.290727349077051,6.1799522832473e-08,0.00043774608545144
"ORF%20type%3Acomplete%20len%3A143%20%28-%29","MSTRG.65952","transcript","74773",0.332052698792621,6.38186183898881e-08,0.00043774608545144
> head genes_results.csv 
"feature","id","fc","pval","qval"
"gene","MSTRG.77795",0.155781580439704,4.0631242814726e-10,1.50971929477883e-05
"gene","MSTRG.58114",0.138680992816583,5.89020832109100e-10,1.50971929477883e-05
"gene","MSTRG.11120",0.139427434280468,5.82310710761647e-09,9.95013721835451e-05
"gene","MSTRG.105613",0.137474485659078,9.33283939019702e-09,0.00011960500320507
"gene","MSTRG.73927",0.072101995435035,1.86477370212046e-08,0.000140717354581543
"gene","MSTRG.33056",0.221507010672155,1.9084307467665e-08,0.000140717354581543
"gene","MSTRG.49785",0.215541958749952,1.92154321343452e-08,0.000140717354581543
"gene","MSTRG.23882",0.285749760093436,3.40711243662639e-08,0.000218319247157928
"gene","MSTRG.47891",0.194907447225835,5.65741986591206e-08,0.000322234063518204

Thank you in advance.

Michal

alyssafrazee commented 7 years ago

Hi @mictadlo, so sorry for the delayed response here. I think you can add the transcript name to your result using texpr(bg, 'all'). That table should have the transcript ID (4th column of transcript_results.csv) and also the transcript name, if the alignment was done with annotation and it's present in the transcript assembly files. Since transcript_results.csv also has the transcript ID, you should be able to match the names up from the other table, e.g. with match or join. Hope this helps!