Closed dependabot[bot] closed 1 year ago
@dependabot rebase
tested locally with GPU too, works:
ggml_init_cublas: found 1 CUDA devices:
Device 0: Tesla T4
llama.cpp: loading model from /home/ubuntu/WizardLM-7B-uncensored.ggmlv3.q4_0.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32001
llama_model_load_internal: n_ctx = 128
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 0.07 MB
llama_model_load_internal: using CUDA for GPU acceleration
llama_model_load_internal: mem required = 1862.39 MB (+ 1026.00 MB per state)
llama_model_load_internal: allocating batch_size x 1 MB = 512 MB VRAM for the scratch buffer
llama_model_load_internal: offloading 32 repeating layers to GPU
llama_model_load_internal: offloading non-repeating layers to GPU
llama_model_load_internal: offloading v cache to GPU
llama_model_load_internal: offloading k cache to GPU
llama_model_load_internal: offloaded 35/35 layers to GPU
llama_model_load_internal: total VRAM used: 5084 MB
...................................................................................................
llama_init_from_file: kv self size = 64.00 MB
Model loaded successfully.
>>> what's the time?
Sending what's the time?
I think it's about time to go.
llama_print_timings: load time = 6383.67 ms
llama_print_timings: sample time = 7.32 ms / 11 runs ( 0.67 ms per token)
llama_print_timings: prompt eval time = 432.61 ms / 8 tokens ( 54.08 ms per token)
llama_print_timings: eval time = 444.48 ms / 10 runs ( 44.45 ms per token)
llama_print_timings: total time = 888.96 ms
Bumps llama.cpp from
2347e45
tod411968
.Commits
d411968
opencl : support k-quants (#1836)b41b4ca
examples : add "simple" (#1840)13fe9d2
cmake : add auto detection of BLAS_INCLUDE_DIRS (#1886)ac3b886
llama : fix embd when offloading non-repeating layers (#1891)5b9ccaf
Fixed possible macro redefinition (#1892)9cbf50c
build : fix and ignore MSVC warnings (#1889)3d01122
CUDA : faster k-quant dot kernels (#1862)602c748
gitignore : add several entries specific to Visual Studio (#1888)a09f919
Fixed CUDA runtime version check (#1879)bed9275
cmake : remove whitespacesDependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase
.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)