biostars / biostar-handbook

Issue tracker for the Biostar Handbook
57 stars 12 forks source link

deseq1.r vs deseq2.r vs edger.r list of genes in raw output #105

Closed njbowen closed 4 years ago

njbowen commented 4 years ago

just an fyi, in case folks want/expect equal numbers of rows in the .csv outputs from the .r scripts.

It seems that the edger.r script is filtering the output to the top 100,000 DEG whereas the deseq1.r and deseq2.r do not. That is, the deseq1&2.r report every transcript or gene in the kallisto idx file. I tried changing the edge.r script at:

Extracts the most differentially expressed genes.

etp <- topTags(etx, n=100000)

to n=300000

as my idx, from the gencode v33 transcripts.fa that I'm using for the kallisto index has around 230k transcripts. this seems to have given me equal row numbers in each output.

ialbert commented 4 years ago

Good observation. It probably should be set to

n=nrow(counts)

to match the data, and better communicate the intent of keeping all rows.

I have applied and pushed this change out.