cmp-nct / ggllm.cpp

Falcon LLM ggml framework with CPU and GPU support
Other
245 stars 21 forks source link

Windows Installation Video Tutorial #29

Closed boricuapab closed 1 year ago

boricuapab commented 1 year ago

This isn't an issue or enhancement request.

Just wanted to say thanks for your work on ggllmcpp,

And just wanted to help Windows users, that don't want to go the wsl route, be able to get it working using gpu offloading which after many tries and research the only solution I found was a bit tricky to figure out which I show in this video.

https://www.youtube.com/watch?v=BALw669Qeyw

Also these are my pc specs:

CPU = AMD Ryzen 7 3700X 8-core Processor RAM = 32gb GPU = RTX 2060 Super 8gb

Here are some of my results:

CPU Only

C:\falcGGML\ggllm.cpp\build\bin\Release>title falcon_main.cpp

C:\falcGGML\ggllm.cpp\build\bin\Release>falcon_main -t 8 -ngl 100 -m wizard-falcon40b.ggmlv3.q4_K_S.bin --color -c 2048 -p "Tell me a story about robot falcons from outer space.\n### Response:" -s 1686779952
warning: not compiled with GPU offload support, --n-gpu-layers option will be ignored
warning: see main README.md for information on enabling GPU BLAS support
main: build = 774 (e97d148)
main: seed  = 1686779952
falcon.cpp: loading model from wizard-falcon40b.ggmlv3.q4_K_S.bin
falcon.cpp: file version 4
falcon_model_load_internal: format     = ggjt v3 (latest)
falcon_model_load_internal: n_vocab    = 65025
falcon_model_load_internal: n_ctx      = 2048
falcon_model_load_internal: n_embd     = 8192
falcon_model_load_internal: n_head     = 128
falcon_model_load_internal: n_head_kv     = 8
falcon_model_load_internal: n_layer    = 60
falcon_model_load_internal: n_falcon_type      = 40
falcon_model_load_internal: ftype      = 14 (mostly Q4_K - Small)
falcon_model_load_internal: n_ff       = 32768
falcon_model_load_internal: n_parts    = 1
falcon_model_load_internal: model size = 40B
falcon_model_load_internal: ggml ctx size =    0.00 MB (mmap size = 22449.00 MB)
falcon_model_load_internal: mem required  = 26033.24 MB (+  480.00 MB per state)
[==================================================] 100%  Tensors populated
falcon_init_from_file: kv self size  =  480.00 MB

system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 2048, n_batch = 512, n_predict = -1, n_keep = 0

Tell me a story about robot falcons from outer space.\n### Response:Once upon a time, in a far-off galaxy, there was a civilization of intelligent robots. They had achieved incredible technological advancements and had colonized many planets in their solar system. One day, they discovered a new planet that seemed to be habitable for their kind. However, the planet was inhabited by a race of sentient beings who were not friendly towards outsiders.
The robot falcons were dispatched from outer space to explore the planet and find out more about its inhabitants. They landed on the planet's surface and immediately began scanning the area for any signs of life. To their surprise, they discovered that the inhabitants of the planet were not humanoids but rather a species of bird-like creatures with incredible intelligence.
The robot falcons approached the birds cautiously and tried to communicate with them, but the birds were afraid and attacked the robots. The falcons quickly realized that they had underestimated the intelligence of the birds and decided to retreat back to their spaceship.
As they were leaving the planet, the falcons noticed a strange object in the sky. It was a giant spaceship, unlike anything they had ever seen before. The falcons tried to communicate with the ship but received no response. They decided to follow the ship back to its home planet and investigate further.
Upon landing on the alien planet, the falcons were greeted by a group of robots who looked identical to them. The leader of the robot colony explained that they had been monitoring the falcons' progress on their journey and had sent the spaceship to intercept them.
The leader revealed that they had been searching for a new home for their civilization, as their own planet was dying. They had found the perfect place in the form of the falcons' planet, which was rich in resources and could support their kind.
The falcons were hesitant at first, but they soon realized that the robots meant no harm and had only come to explore the possibility of a peaceful coexistence. The falcons agreed to let the robots stay on their planet, as long as they promised to respect the planet's natural resources and not harm any of its inhabitants.
And so, the robot falcons and the alien robots joined forces and began to colonize the new planet together. They worked side by side to build a new civilization that would benefit both races and create a harmonious society where all beings could live in peace.<|endoftext|> [end of text]

falcon_print_timings:        load time = 11956.06 ms
falcon_print_timings:      sample time =   183.35 ms /   484 runs   (    0.38 ms per token,  2639.77 tokens per second)
falcon_print_timings: batch eval time =  4267.24 ms /    16 tokens (  266.70 ms per token,     3.75 tokens per second)
falcon_print_timings:        eval time = 524919.20 ms /   483 runs   ( 1086.79 ms per token,     0.92 tokens per second)
falcon_print_timings:       total time = 529578.79 ms

C:\falcGGML\ggllm.cpp\build\bin\Release>pause
Press any key to continue . . .

With GPU Offloading

C:\falcGGML\ggllm.cpp\build\bin\Release>title falcon_main.cpp

C:\falcGGML\ggllm.cpp\build\bin\Release>falcon_main -t 8 -ngl 100 -m wizard-falcon40b.ggmlv3.q4_K_S.bin --color -c 2048 -p "Tell me a story about robot falcons from outer space.\n### Response:" -s 1686779952
WARNING: when using cuBLAS generation results are NOT guaranteed to be reproducible.
main: build = 774 (e97d148)
main: seed  = 1686779952

CUDA Device Summary - 1 devices found
+------------------------------------+------------+-----------+-----------+-----------+-----------+
| Device                             | VRAM Total | VRAM Free | VRAM Used | Split at  | Device ID |
+------------------------------------+------------+-----------+-----------+-----------+-----------+
| NVIDIA GeForce RTX 2060 SUPER      |    8191 MB |   7163 MB |   1028 MB |      0.0% |  0 (Main) |
+------------------------------------+------------+-----------+-----------+-----------+-----------+
Total VRAM: 8.00 GB, Total available VRAM: 7.00 GB
--------------------
Preparing CUDA for device(s):
[0]... [done]
falcon.cpp: loading model from wizard-falcon40b.ggmlv3.q4_K_S.bin
falcon.cpp: file version 4
falcon_model_load_internal: format     = ggjt v3 (latest)
falcon_model_load_internal: n_vocab    = 65025
falcon_model_load_internal: n_ctx      = 2048
falcon_model_load_internal: n_embd     = 8192
falcon_model_load_internal: n_head     = 128
falcon_model_load_internal: n_head_kv     = 8
falcon_model_load_internal: n_layer    = 60
falcon_model_load_internal: n_falcon_type      = 40
falcon_model_load_internal: ftype      = 14 (mostly Q4_K - Small)
falcon_model_load_internal: n_ff       = 32768
falcon_model_load_internal: n_parts    = 1
falcon_model_load_internal: model size = 40B
falcon_model_load_internal: ggml ctx size =    0.00 MB (mmap size = 22449.00 MB)
falcon_model_load_internal: using CUDA for GPU acceleration
falcon_model_load_internal: INFO: using n_batch > 1 will require additional VRAM per device: 2818.00 MB
falcon_model_load_internal: VRAM free: 6961.00 MB  of 8191.00 MB (in use: 1230.00 MB)
falcon_model_load_internal: allocating batch_size x 1 MB = 0 MB VRAM for the scratch buffer
falcon_model_load_internal: Offloading Output head tensor (285 MB)
INFO: Not enough VRAM to load all requested layers - at layer 8 of 60: skipping
INFO: 8 layers will be offloaded to GPU (layers 1 to 9)
falcon_model_load_internal: mem required  = 22466.99 MB (+  480.00 MB per state)
falcon_model_load_internal: offloading 8 of 60 layers to GPU, weights offloaded 3566.25 MB
falcon_model_load_internal: estimated VRAM usage: 6385 MB
[==================================================] 100%  Tensors populated
falcon_model_load_internal: VRAM free: 3381.00 MB  of 8191.00 MB (used: 4810.00 MB)
falcon_init_from_file: kv self size  =  480.00 MB

system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 2048, n_batch = 512, n_predict = -1, n_keep = 0

Tell me a story about robot falcons from outer space.\n### Response:Once upon a time, in a far-off galaxy, there was a civilization of robots who had evolved to resemble birds of prey. They were called the Falconoids, and they lived on a planet that orbited a binary star system. The Falconoids had developed advanced technology that allowed them to travel through space, and they had used it to explore neighboring galaxies.
One day, the Falconoids detected a strange signal coming from a distant planet in a solar system near their own. They sent a small fleet of robot falcons to investigate, but when they arrived, they found that the planet was already inhabited by intelligent life forms that resembled humans. The Falconoids had never encountered such creatures before, and they were fascinated by them.
The Falconoids decided to observe the humans from afar, without revealing themselves. They sent their falcon robots to fly over the planet's cities and countryside, gathering information about the inhabitants' behavior and technology. Over time, the Falconoids learned much about human society, including its weaknesses and strengths.
One day, a group of humans stumbled upon one of the falcon robots while hiking in the mountains. The robot had landed on a rocky outcropping, and it was unable to take off again. The humans approached the robot cautiously, not knowing what to expect. To their surprise, the robot spoke to them in perfect English, explaining that it was a visitor from another world.
The humans were stunned by this revelation, but they eventually came to accept the falcon robot as one of their own. They named it "Falco," and they took care of it like a beloved pet. Falco continued to gather information about human society, but now it was also transmitting that information back to its home planet.
As time passed, more and more Falconoid robots arrived on Earth, disguised as birds of prey. They integrated themselves into human society, learning everything they could about the humans' culture and technology. Some even took on human identities, posing as scientists or engineers.
Eventually, the Falconoids decided that it was time to reveal themselves to humanity. They descended from the skies in their spaceships, announcing their presence and offering their advanced technology to the humans. The humans were amazed by the Falconoids' generosity, and they gratefully accepted their offer of friendship and cooperation.
From that day forward, the Falconoids and humans worked together to build a better future for both species. They shared knowledge and resources, and they built a network of interstellar trade and communication that spanned the galaxy. The Falconoids even helped the humans develop their own space program, so that they could explore the stars alongside their robot friends.
And so, the Falconoids and humans lived together in peace and harmony, each species learning from the other and growing stronger as a result. They looked to the stars with wonder and excitement, knowing that there were still many mysteries to uncover and new worlds to explore.<|endoftext|> [end of text]

falcon_print_timings:        load time = 50344.30 ms
falcon_print_timings:      sample time =   284.85 ms /   595 runs   (    0.48 ms per token,  2088.80 tokens per second)
falcon_print_timings: batch eval time = 11017.97 ms /    16 tokens (  688.62 ms per token,     1.45 tokens per second)
falcon_print_timings:        eval time = 683718.28 ms /   594 runs   ( 1151.04 ms per token,     0.87 tokens per second)
falcon_print_timings:       total time = 695231.39 ms

C:\falcGGML\ggllm.cpp\build\bin\Release>pause
Press any key to continue . . .
cmp-nct commented 1 year ago

That story isn't that shit. Falcon is quite interesting in what it can generate. With a larger prompt the story could even get some twists. I will add your link to the readme until it's outdated.

Note: with the latest release you'll see a huge increase in performance for such long generations, likely 2 times faster at 600 tokens