I am trying to analyse bacteria RNA-seq data with star.
I got the annotation in gtf format from Refseq database and I run the genomeGenerate mode. At first everything seems smooth. When I got the gene counts, problems came. The gene counts file only consists of 81 genes, which is definitely unresonnable. RNA-seq data from bacterium must far more than 81 genes.
Then I checked the procedures carefully, I found that only 81 genes were indexed in the output indexs files.
The 81 genes correspond exactly to the third column of the gtf file which is annotated as "transcripts". However, other genes are all ignored. I run the code: cut LMO-1.gtf -f 3 | grep "transcript" | wc -l and the output is 81.
I doubt that maybe the gtf file is inaccuate, but the annotation from Refseq database is as accurate as it can be for my specie.
Maybe I should revise the gft file mannually, right?
Looking forward to your advice. It would really help me a lot!
Thanks!
Yuxi
here is my code:
STAR --runMode genomeGenerate \
--runThreadN 20 \
--genomeFastaFiles LMO-1.fna \
--sjdbGTFfile LMO-1.gtf \
--genomeDir LMO-1_db \
--sjdbOverhang 149 \
--genomeSAindexNbases 9
Hello Alex/everyone,
I am trying to analyse bacteria RNA-seq data with star.
I got the annotation in gtf format from Refseq database and I run the genomeGenerate mode. At first everything seems smooth. When I got the gene counts, problems came. The gene counts file only consists of 81 genes, which is definitely unresonnable. RNA-seq data from bacterium must far more than 81 genes.
Then I checked the procedures carefully, I found that only 81 genes were indexed in the output indexs files.
The 81 genes correspond exactly to the third column of the gtf file which is annotated as "transcripts". However, other genes are all ignored. I run the code: cut LMO-1.gtf -f 3 | grep "transcript" | wc -l and the output is 81. I doubt that maybe the gtf file is inaccuate, but the annotation from Refseq database is as accurate as it can be for my specie. Maybe I should revise the gft file mannually, right?
Looking forward to your advice. It would really help me a lot!
Thanks!
Yuxi
here is my code: STAR --runMode genomeGenerate \ --runThreadN 20 \ --genomeFastaFiles LMO-1.fna \ --sjdbGTFfile LMO-1.gtf \ --genomeDir LMO-1_db \ --sjdbOverhang 149 \ --genomeSAindexNbases 9
View one of the output indexs files: cat transcriptInfo.tab Output: 81 unassigned_transcript_14 18927 19001 19001 2 1 0 0 unassigned_transcript_15 19178 19254 19001 1 1 1 1 unassigned_transcript_53 58277 58354 19254 2 1 2 2 unassigned_transcript_57 63486 63557 58354 1 1 3 3 unassigned_transcript_62 68250 68337 63557 1 1 4 4 unassigned_transcript_92 94012 95492 68337 1 1 5 5 unassigned_transcript_93 95605 95677 95492 1 1 6 6 unassigned_transcript_94 95822 98747 95677 1 1 7 7 unassigned_transcript_95 98815 98936 98747 1 1 8 8 unassigned_transcript_96 99131 99202 98936 1 1 9 9 unassigned_transcript_97 99331 99452 99202 1 1 10 10 unassigned_transcript_143 132848 132953 99452 1 2 11 11 unassigned_transcript_153 141534 141606 132953 1 1 13 12 unassigned_transcript_154 141628 141702 141606 1 1 14 13 unassigned_transcript_189 179693 180045 141702 2 1 15 14 unassigned_transcript_221 198901 198983 180045 1 1 16 15 unassigned_transcript_232 208591 208678 198983 1 1 17 16 unassigned_transcript_233 208762 208849 208678 1 1 18 17 unassigned_transcript_234 208933 209020 208849 1 1 19 18 unassigned_transcript_247 224591 224668 209020 1 1 20 19 unassigned_transcript_270 243553 243627 224668 1 1 21 20 unassigned_transcript_271 243818 243892 243627 1 1 22 21 unassigned_transcript_283 253769 253845 243892 1 1 23 22 unassigned_transcript_284 253914 253988 253845 1 1 24 23 unassigned_transcript_285 254003 254074 253988 1 1 25 24 unassigned_transcript_286 255101 256577 254074 1 1 26 25 unassigned_transcript_287 256690 256762 256577 1 1 27 26 unassigned_transcript_288 256907 259834 256762 1 1 28 27 unassigned_transcript_289 259902 260023 259834 1 1 29 28 unassigned_transcript_290 260219 260290 260023 1 1 30 29 unassigned_transcript_374 341518 341595 260290 2 1 31 30 unassigned_transcript_394 367304 367502 341595 1 2 32 31 unassigned_transcript_468 430418 430503 367502 2 1 34 32 unassigned_transcript_487 450863 450933 430503 1 1 35 33 unassigned_transcript_531 493076 493161 450933 2 1 36 34 unassigned_transcript_537 498438 498532 493161 1 2 37 35 unassigned_transcript_767 763237 763309 498532 1 1 39 36 unassigned_transcript_805 804622 804697 763309 2 1 40 37 unassigned_transcript_983 985334 985418 804697 2 1 41 38 unassigned_transcript_1035 1047313 1047434 985418 2 1 42 39 unassigned_transcript_1036 1047667 1047738 1047434 2 1 43 40 unassigned_transcript_1037 1047933 1048054 1047738 2 1 44 41 unassigned_transcript_1038 1048122 1051052 1048054 2 1 45 42 unassigned_transcript_1039 1051197 1051269 1051052 2 1 46 43 unassigned_transcript_1040 1051381 1052855 1051269 2 1 47 44 unassigned_transcript_1099 1118271 1118344 1052855 2 1 48 45 unassigned_transcript_1150 1176551 1176624 1118344 1 1 49 46 unassigned_transcript_1201 1247639 1247760 1176624 2 1 50 47 unassigned_transcript_1202 1247994 1248065 1247760 2 1 51 48 unassigned_transcript_1203 1248260 1248381 1248065 2 1 52 49 unassigned_transcript_1204 1248449 1251375 1248381 2 1 53 50 unassigned_transcript_1205 1251520 1251592 1251375 2 1 54 51 unassigned_transcript_1206 1251705 1253180 1251592 2 1 55 52 unassigned_transcript_1226 1281405 1281476 1253180 2 1 56 53 unassigned_transcript_1309 1362815 1362902 1281476 1 1 57 54 unassigned_transcript_1418 1478091 1478162 1362902 2 1 58 55 unassigned_transcript_1433 1494182 1494258 1478162 2 1 59 56 unassigned_transcript_1528 1584375 1584449 1494258 2 1 60 57 unassigned_transcript_1586 1634713 1634784 1584449 1 1 61 58 unassigned_transcript_1617 1671401 1671473 1634784 2 1 62 59 unassigned_transcript_1618 1671553 1671661 1671473 2 2 63 60 unassigned_transcript_1694 1745621 1745739 1671661 1 1 65 61 unassigned_transcript_1705 1755161 1755238 1745739 1 1 66 62 unassigned_transcript_1732 1775810 1775881 1755238 1 1 67 63 unassigned_transcript_1793 1835493 1835807 1775881 1 1 68 64 unassigned_transcript_1794 1835887 1835971 1835807 1 1 69 65 unassigned_transcript_1840 1885290 1885362 1835971 2 1 70 66 unassigned_transcript_1845 1889642 1889716 1885362 1 1 71 67 unassigned_transcript_1973 2019772 2019880 1889716 2 2 72 68 unassigned_transcript_2000 2050283 2050357 2019880 2 1 74 69 unassigned_transcript_2050 2104140 2105615 2050357 1 1 75 70 unassigned_transcript_2051 2105728 2105800 2105615 1 1 76 71 unassigned_transcript_2052 2105944 2108869 2105800 1 1 77 72 unassigned_transcript_2053 2108937 2109058 2108869 1 1 78 73 unassigned_transcript_2054 2109253 2109324 2109058 1 1 79 74 unassigned_transcript_2055 2109453 2109574 2109324 1 1 80 75 unassigned_transcript_2204 2254252 2254323 2109574 1 1 81 76 unassigned_transcript_2333 2385485 2385559 2254323 1 1 82 77 unassigned_transcript_2368 2422249 2422322 2385559 1 1 83 78 unassigned_transcript_2371 2424037 2424110 2422322 1 1 84 79 unassigned_transcript_2420 2472851 2472924 2424110 1 1 85 80