alexdobin / STAR

RNA-seq aligner
MIT License
1.77k stars 495 forks source link

genome length --how computed? #1196

Open krabapple opened 3 years ago

krabapple commented 3 years ago

I am trying to understand how STAR (v 2.7.6a ) calculates genome length. My genome consists of 219 contigs. I compute their total length to be 181467262 bp . STAR says this (from the Log.out file):

`genomeFileSizes               234094592   1496773479        ~RE-DEFINED
Genome version is compatible with current STAR
Number of real (reference) chromosomes= 219
1       Contig0     40212017        0
2       Contig1     34657275        40370176
3       Contig2     26081721        75235328
4       Contig3     27738065        101449728
5       Contig4     20331543        129236992
6       Contig5     27726387        149684224
7       Contig6     4123    177471488
8       Contig7     6891    177733632
9       Contig8     4968    177995776
10      Contig9     9349    178257920
11      Contig10    8113    178520064
12      Contig11    22343   178782208
13      Contig12    3023    179044352
14      Contig13    57747   179306496
15      Contig14    5207    179568640
16      Contig15    4456    179830784
17      Contig16    2809    180092928
18      Contig17    18317   180355072
19      Contig18    9604    180617216
20      Contig19    1029    180879360
21      Contig20    946     181141504
22      Contig21    736     181403648
23      Contig22    13420   181665792
24      Contig23    5175    181927936
25      Contig24    3763    182190080
26      Contig25    3688    182452224
27      Contig26    9035    182714368
28      Contig27    14061   182976512
29      Contig28    3465    183238656
30      Contig29    2539    183500800
31      Contig30    9997    183762944
32      Contig31    3707    184025088
33      Contig32    5692    184287232
34      Contig33    7833    184549376
35      Contig34    19122   184811520
36      Contig35    2519    185073664
37      Contig36    97402   185335808
38      Contig37    8463    185597952
39      Contig38    2525    185860096
40      Contig39    10299   186122240
41      Contig40    22957   186384384
42      Contig41    24494   186646528
43      Contig42    10465   186908672
44      Contig43    7981    187170816
45      Contig44    109354  187432960
46      Contig45    2743    187695104
47      Contig46    1134    187957248
48      Contig47    4178    188219392
49      Contig48    3189    188481536
50      Contig49    7148    188743680
51      Contig50    25490   189005824
52      Contig51    55442   189267968
53      Contig52    5726    189530112
54      Contig53    6598    189792256
55      Contig54    509     190054400
56      Contig55    481     190316544
57      Contig56    28446   190578688
58      Contig57    1310    190840832
59      Contig58    5254    191102976
60      Contig59    3222    191365120
61      Contig60    13047   191627264
62      Contig61    4248    191889408
63      Contig62    5806    192151552
64      Contig63    21103   192413696
65      Contig64    8845    192675840
66      Contig65    2843    192937984
67      Contig66    16958   193200128
68      Contig67    2798    193462272
69      Contig68    4790    193724416
70      Contig69    8856    193986560
71      Contig70    384     194248704
72      Contig71    4094    194510848
73      Contig72    1913    194772992
74      Contig73    18      195035136
75      Contig74    3074    195297280
76      Contig75    1540    195559424
77      Contig76    14406   195821568
78      Contig77    6335    196083712
79      Contig78    6278    196345856
80      Contig79    71697   196608000
81      Contig80    45024   196870144
82      Contig81    2508    197132288
83      Contig82    18516   197394432
84      Contig83    2887    197656576
85      Contig84    1903    197918720
86      Contig85    11708   198180864
87      Contig86    3578    198443008
88      Contig87    357916  198705152
89      Contig88    114971  199229440
90      Contig89    5793    199491584
91      Contig90    1286    199753728
92      Contig91    3055    200015872
93      Contig92    16296   200278016
94      Contig93    6014    200540160
95      Contig94    12162   200802304
96      Contig95    8480    201064448
97      Contig96    5112    201326592
98      Contig97    3444    201588736
99      Contig98    13493   201850880
100     Contig99    45819   202113024
101     Contig100   523     202375168
102     Contig101   28565   202637312
103     Contig102   3468    202899456
104     Contig103   12969   203161600
105     Contig104   12041   203423744
106     Contig105   9396    203685888
107     Contig106   3916    203948032
108     Contig107   5858    204210176
109     Contig108   12534   204472320
110     Contig109   13158   204734464
111     Contig110   10909   204996608
112     Contig111   11691   205258752
113     Contig112   22393   205520896
114     Contig113   3980    205783040
115     Contig114   312855  206045184
116     Contig115   7194    206569472
117     Contig116   2590    206831616
118     Contig117   6291    207093760
119     Contig118   7112    207355904
120     Contig119   6709    207618048
121     Contig120   16764   207880192
122     Contig121   9402    208142336
123     Contig122   5531    208404480
124     Contig123   2335    208666624
125     Contig124   4441    208928768
126     Contig125   6496    209190912
127     Contig126   1036    209453056
128     Contig127   72179   209715200
129     Contig128   223072  209977344
130     Contig129   3938    210239488
131     Contig130   4059    210501632
132     Contig131   53082   210763776
133     Contig132   4341    211025920
134     Contig133   3806    211288064
135     Contig134   8981    211550208
136     Contig135   1031    211812352
137     Contig136   1186    212074496
138     Contig137   3169    212336640
139     Contig138   4897    212598784
140     Contig139   2716    212860928
141     Contig140   30666   213123072
142     Contig141   17781   213385216
143     Contig142   27440   213647360
144     Contig143   33583   213909504
145     Contig144   11703   214171648
146     Contig145   3872    214433792
147     Contig146   39921   214695936
148     Contig147   869     214958080
149     Contig148   23673   215220224
150     Contig149   4065    215482368
151     Contig150   20782   215744512
152     Contig151   4631    216006656
153     Contig152   4316    216268800
154     Contig153   17715   216530944
155     Contig154   23346   216793088
156     Contig155   6222    217055232
157     Contig156   233009  217317376
158     Contig157   4947    217579520
159     Contig158   7561    217841664
160     Contig159   4309    218103808
161     Contig160   11984   218365952
162     Contig161   1782    218628096
163     Contig162   78174   218890240
164     Contig163   3784    219152384
165     Contig164   156932  219414528
166     Contig165   3680    219676672
167     Contig166   4336    219938816
168     Contig167   1977    220200960
169     Contig168   7179    220463104
170     Contig169   4993    220725248
171     Contig170   170613  220987392
172     Contig171   37773   221249536
173     Contig172   305720  221511680
174     Contig173   3137    222035968
175     Contig174   13327   222298112
176     Contig175   5931    222560256
177     Contig176   21589   222822400
178     Contig177   5325    223084544
179     Contig178   6159    223346688
180     Contig179   17036   223608832
181     Contig180   16855   223870976
182     Contig181   4724    224133120
183     Contig182   7914    224395264
184     Contig183   1157    224657408
185     Contig184   3871    224919552
186     Contig185   22644   225181696
187     Contig186   236109  225443840
188     Contig187   18573   225705984
189     Contig188   5590    225968128
190     Contig189   2928    226230272
191     Contig190   3490    226492416
192     Contig191   57839   226754560
193     Contig192   1344    227016704
194     Contig193   25979   227278848
195     Contig194   2287    227540992
196     Contig195   14255   227803136
197     Contig196   1207    228065280
198     Contig197   8071    228327424
199     Contig198   25840   228589568
200     Contig199   45732   228851712
201     Contig200   18488   229113856
202     Contig201   18515   229376000
203     Contig202   9407    229638144
204     Contig203   25789   229900288
205     Contig204   6552    230162432
206     Contig205   5585    230424576
207     Contig206   26641   230686720
208     Contig207   25591   230948864
209     Contig208   6344    231211008
210     Contig209   77327   231473152
211     Contig210   34205   231735296
212     Contig211   14750   231997440
213     Contig212   5360    232259584
214     Contig213   19870   232521728
215     Contig214   2008    232783872
216     Contig215   7487    233046016
217     Contig216   6226    233308160
218     Contig217   6237    233570304
219     Contig218   1522    233832448
--sjdbOverhang = 99 taken from the generated genome
Started loading the genome: Sat Apr  3 18:54:35 2021

Genome: size given as a parameter = 234094592

`

The sum I get for column 3 matches my previous calculation (181467262 ).

How is STAR coming up with 234094592?

And how are the values in column 4 calculated?

krabapple commented 3 years ago

Never mind. I see I'm confusing file size (bytes) with genome size.

alexdobin commented 3 years ago

Hi @krabapple

STAR pads the spaces between chromosomes, so total genome "size" in RAM is not the sum of chromosomes lengths.

Cheers Alex