certik / fastGPT

Fast GPT-2 inference written in Fortran
MIT License
180 stars 16 forks source link

Implement tokenizer in Fortran #34

Closed certik closed 1 year ago

certik commented 1 year ago

Fixes #1

TODO:

certik commented 1 year ago

Currently it prints:

$ ./gpt2 
Loading the model...
    done. Time:   0.106s
Model parameters:
n_vocab = 50257
n_ctx   =  1024
n_embd  =   768
n_layer =    12
n_head  =    12

Input parameters:
n_seq                =  19
n_tokens_to_generate =  20

Input tokens:
 36235 39141 18765  1143   326  9061   561   530  1110  1716   845  3665    11   475   772   339   714   407  5967
Decoded input as text:
Alan Turing theorized that computers would one day become very powerful, but even he could not imagine
 Encoded tokens
       36235       39141           0         326        9061         561         530        1110        1716         845        3665          11         475         772         339         714         407        5967

So it almost works.

certik commented 1 year ago

The tokens now agree:

$ ./gpt2 
Loading the model...
    done. Time:   0.107s
Model parameters:
n_vocab = 50257
n_ctx   =  1024
n_embd  =   768
n_layer =    12
n_head  =    12

Input parameters:
n_seq                =  19
n_tokens_to_generate =  20

Input tokens:
 36235 39141 18765  1143   326  9061   561   530  1110  1716   845  3665    11   475   772   339   714   407  5967
Decoded input as text:
Alan Turing theorized that computers would one day become very powerful, but even he could not imagine
 Encoded tokens
 36235 39141 18765  1143   326  9061   561   530  1110  1716   845  3665    11   475   772   339   714   407  5967

But the bpe function is just a stub for now, we now need to actually implement it.

certik commented 1 year ago

I think the tokenizer now works:

$ ./gpt2 
Loading the model...
    done. Time:   0.103s
Model parameters:
n_vocab = 50257
n_ctx   =  1024
n_embd  =   768
n_layer =    12
n_head  =    12

Input parameters:
n_seq                =  19
n_tokens_to_generate =  20

Input tokens:
 36235 39141 18765  1143   326  9061   561   530  1110  1716   845  3665    11   475   772   339   714   407  5967
Decoded input as text:
Alan Turing theorized that computers would one day become very powerful, but even he could not imagine
 Encoded tokens
     (Currently we use O(n) vocabulary lookup instead of O(1) -> very SLOW)
 36235 39141 18765  1143   326  9061   561   530  1110  1716   845  3665    11   475   772   339   714   407  5967
Running model...
At line 268 of file /Users/ondrej/repos/fastGPT/gpt2.f90
Fortran runtime warning: An array temporary was created for argument 'kv_cache' of procedure 'gpt2'
At line 147 of file /Users/ondrej/repos/fastGPT/gpt2.f90
Fortran runtime warning: An array temporary was created for argument 'q' of procedure 'attention'
At line 148 of file /Users/ondrej/repos/fastGPT/gpt2.f90
Fortran runtime warning: An array temporary was created for argument 'k' of procedure 'attention'
At line 149 of file /Users/ondrej/repos/fastGPT/gpt2.f90
Fortran runtime warning: An array temporary was created for argument 'v' of procedure 'attention'
 how they would be able to do so.

"I think that the most important thing is
    done. Time:   0.331s (1.01x)
Output tokens:
   703   484   561   307  1498   284   466   523    13   198   198     1    40   892   326   262   749  1593  1517   318
Decoded output as text:
 how they would be able to do so.

"I think that the most important thing is

Including for utf-8 input.