NouamaneTazi / bloomz.cpp

C++ implementation for BLOOM
MIT License
812 stars 65 forks source link

Tiny #1

Closed NouamaneTazi closed 1 year ago

NouamaneTazi commented 1 year ago
main: seed = 1678796877
llama_model_load: loading model from '/Users/nouamanetazi/projects/bloomz.cpp/models/ggml-model-f32.bin' - please wait ...
llama_model_load: n_vocab = 250880
llama_model_load: n_ctx   = 2048
llama_model_load: n_embd  = 1024
llama_model_load: n_mult  = 1
llama_model_load: n_head  = 16
llama_model_load: n_layer = 24
llama_model_load: n_rot   = 64
llama_model_load: f16     = 0
llama_model_load: n_ff    = 4096
llama_model_load: n_parts = 1
llama_model_load: ggml ctx size = 3497.58 MB
llama_model_load: memory_size =   384.00 MB, n_mem = 49152
llama_model_load: loading model part 1/1 from '/Users/nouamanetazi/projects/bloomz.cpp/models/ggml-model-f32.bin'
llama_model_load: ................................................ done
llama_model_load: model size =  3113.23 MB / num tensors = 390

main: prompt: 'the lazy brown'
main: number of tokens in prompt = 3
  5984 -> 'the'
109586 -> ' lazy'
 73173 -> ' brown'
main: mem per token =  5646276 bytes
main:     load time =  2494.93 ms
main:   sample time =     1.96 ms
main:  predict time =   675.43 ms / 56.29 ms per token
main:    total time =  3903.00 ms