ngxson commented 3 months ago

Motivation

With the recent introduction of eval-callback example, we now having more tools for debugging when working with llama.cpp. However, one of the tool that I feel missing is the ability to dump everything inside a gguf file into a human-readable (and interactive) interface.

Inspired from huggingface.js where users can visualize the KV and list of tensors on huggingface.com, I would like to implement the same thing in llama.cpp. I find this helpful in these situations:

Debugging convert.py script when adding a new architecture
Debugging tokenizers
Debugging changes related to gguf (model splits for example)
Debugging tensors (i.e. display N first elements of a tensor, just like eval-callback)
Debugging control vectors
... (maybe other usages in the future)

The reason why I can't use huggingface.js is because it's based on browser, which make it tricky when reading a huge local file. It also don't have access to quantized types (same for gguf-py).

Possible Implementation

Ideally, I want the implementation to be a binary named gguf-viewer that when run, will open a web page in localhost:8080. User can then go to the web page to explore the gguf file. It will have these sections:

Complete list of KV
Tokenizer-related info (for example: list all tokens, lookup one token)
List of all tensors

phymbert commented 3 months ago

Have you seen:

gguf-dump for printing metadata ?

Or do you want something dynamic during the forward ?

ngxson commented 3 months ago

Yes I tries gguf-py but it does not have access to quantized types

ggerganov commented 3 months ago

This could be quite fun. The web page can also generate a set of useful llama.cpp commands for that specific model (e.g. run main, server, etc) that can be copy-pasted for convenience.

github-actions[bot] commented 1 month ago

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions[bot] commented 1 week ago

This issue was closed because it has been inactive for 14 days since being marked as stale.

oldgithubman commented 1 week ago

@ngxson reopen? Also, I'd like to suggest similar functionality for imatrices. Or should I open a parallel FR?

ggerganov / llama.cpp

Feature request: Graphical GGUF viewer #6715

Motivation

Possible Implementation