Open timtensor opened 7 months ago
It'll be quite a lot easier to do what you want with the python bindings. I recommend giving the docs a quick read so you know how to do inferences in python code. Below is a quick example on doing inference with the bindings.
>>> from llama_cpp import Llama
>>> llm = Llama(
model_path="./models/7B/llama-model.gguf",
# n_gpu_layers=-1, # Uncomment to use GPU acceleration
# seed=1337, # Uncomment to set a specific seed
# n_ctx=2048, # Uncomment to increase the context window
)
>>> output = llm(
"Q: Name the planets in the solar system? A: ", # Prompt
max_tokens=32, # Generate up to 32 tokens, set to None to generate up to the end of the context window
stop=["Q:", "\n"], # Stop generating just before the model would generate a new question
echo=True # Echo the prompt back in the output
) # Generate a completion, can also call create_completion
>>> print(output)
{
"id": "cmpl-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"object": "text_completion",
"created": 1679561337,
"model": "./models/7B/llama-model.gguf",
"choices": [
{
"text": "Q: Name the planets in the solar system? A: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, Neptune and Pluto.",
"index": 0,
"logprobs": None,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 14,
"completion_tokens": 28,
"total_tokens": 42
}
}
Basically I'd just make a python script that parses the jsons and then feed it to the Llama class for inference. You can then take the output and reformat it however you want and then write it to a txt file. If you're not too familiar with programming chatgpt could probably write this for you.
Hi @mathewpan2] , I actually have now a single json with different tags. I can extract information from the json which i thought i could as you said feed it to the llm module for inference. However i am not getting the output correctly . For example , extracts all the comments. test_description =data[0]['description'] This i input it to the llm as follows setup llm
_llm = Llama(
model_path="./mistral-7b-instruct-v0.2.Q4_0.gguf",
n_gpu_layers=-1 if enable_gpu else 0,
n_ctx=2048,
)
_llm.verbose = False
llm = LlamaCppLLM(name="Mistral-7B-v0.2-Instruct", llm=_llm, max_tokens=None)
and ask for summary as follows
_llm(f'Summarize the following {test_description}')
However i keep getting arbitary answers i am not sure what is the right way to be honest
—
Can you post your entire code?
Hi yes , I actually switched to llama index to be honest . But the json string it still cannot parse properly. Not sure if it helps
from llama_index.core import PromptTemplate
from llama_index.llms.huggingface import HuggingFaceLLM
from llama_index.core import download_loader
from llama_index.core import ServiceContext
from llama_index.core import VectorStoreIndex
from llama_index.core import SummaryIndex
from llama_index.core.response.notebook_utils import display_response
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import RouterQueryEngine
from llama_index.core.node_parser import JSONNodeParser
from llama_index.readers.file import FlatReader
from pathlib import Path
import locale
from llama_index.core.ingestion import IngestionPipeline
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.extractors import TitleExtractor
from llama_index.core.embeddings import resolve_embed_model
locale.getpreferredencoding = lambda: "UTF-8"
# Load your JSON data
json_docs = FlatReader().load_data(Path("/content/data/smalldata.json"))
# Initialize the JSONNodeParser
parser = JSONNodeParser()
# Parse the loaded data
nodes = parser.get_nodes_from_documents(json_docs)
import torch
from transformers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
)
llm = HuggingFaceLLM(
model_name="mistralai/Mistral-7B-Instruct-v0.2",
tokenizer_name="mistralai/Mistral-7B-Instruct-v0.2",
query_wrapper_prompt=PromptTemplate("<s>[INST] {query_str} [/INST] </s>\n"),
context_window=3900,
max_new_tokens=256,
model_kwargs={"quantization_config": quantization_config},
# tokenizer_kwargs={},
generate_kwargs={"temperature": 0.2, "top_k": 5, "top_p": 0.95},
device_map="auto",
)
embed_model = resolve_embed_model("local:thenlper/gte-small")
# create the pipeline with transformations
pipeline = IngestionPipeline(
transformations=[
SentenceSplitter(chunk_size=200, chunk_overlap=70),
embed_model,
]
)
# Assuming you have already parsed your documents into nodes
nodes = parser.get_nodes_from_documents(json_docs)
# run the pipeline
processed_nodes = pipeline.run(documents=nodes)
# Index setup
vector_index = VectorStoreIndex.from_documents(documents, service_context=service_context)
# Summary setup
summary_index = SummaryIndex.from_documents(documents, service_context=service_context)
import logging
import sys
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
#Query Engine
query_engine = vector_index.as_query_engine(response_mode="compact")
response = query_engine.query("How many titles are there?")
display_response(response)
Hi I have a couple of json files that i want to input it to
llm
usingllama-cpp
. I am wondering if there is a way to do that . I am planning to usemistral7b-instruct
as a start.The json file looks as follows
I want to use llama cpp to pass
And output the llm into maybe a text with format
Is it possible to do it via llama cpp or is it only possible through python bindings. i thought it would be much faster with llama cpp without the python bindings.
Has anyone tried something similar ?