Closed aspencoyle closed 3 years ago
What percentage of the entire transcriptome is annotated based on the annotation file you are using--- and how does that percentage compare to your DEG list?
For my 4 DEG lists combined, 1061/2930 differently-expressed genes are annotated (36.2%)
The annotation file I'm using here - labeled as BlastX annotation for the transcriptome has 66,596 transcripts, and all are annotated.
The whole transcriptome (linked on the Genomic Resources page as cbai_transcriptome_v3.0.fasta) has 344,944 sequences, so 19.4% are annotated.
I'm also using transcriptome v3.0 (pooled libraries only) rather than transcriptome v2.0 (pooled libraries + individual libraries), as my computer couldn't handle creating a Kallisto index for v2.0 - maybe that could be the source of some of the disparity?
Seems like only 19% of transcriptome is annotated-- you got 36% so you are doing good. Did you annotate v3? If not you should to get aware of ins and outs.
Alright, great! And nope, I didn't annotate v3 myself - I'll start on that using Trinotate!
I'm looking to take the tables of transcript IDs produced by my DESeq2 analysis (example available here and match them with the accession IDs for version 3.0 of C. bairdii transcriptome.
I'm using this table to match genes to transcripts., found under BLASTx Annotation for cbai_hemat_transcriptome_v3.0 on the Genomic Resources page.
However, when I do, I get a lot of unmatched transcripts. Of the 1022 significantly different transcripts in my ambient vs. low treatments, 641 of them aren't matched to a particular gene. During our Science Hour chat, I thought I remembered hearing that it's unusual for a transcript to not be matched to a gene. Am I just misremembering? And if not, what's going on here?
Don't think my code is causing the issue, but posting it below to be safe:
EDITED: Added code block formatting.