aiplanethub / genai-stack

An End to End GenAI Framework
https://genaistack.aiplanet.com/
Apache License 2.0
121 stars 40 forks source link

Enable Pipeline in Hugging Face models #90

Closed sam-aiplanet closed 7 months ago

sam-aiplanet commented 9 months ago

We need a functionality to use pipelines directly in the HuggingFace model. Lot of DataScientists are very well comfortable with declaring pipelines from Hugging face directly instead of passing it through model_kwargs and pipeline_kwargs which makes it too confusing for them:

How we are currently building pipelines:

llm = HuggingFaceModel.from_kwargs(model=model_name_or_path,
 model_kwargs={"device_map":"cuda","quantization_config":quantization_config,"trust_remote_code":"False","low_cpu_mem_usage":"True"},
 task='text-generation',
 pipeline_kwargs=({"max_new_tokens":512,"do_sample":"True","temperature":0.7,"top_p":0.95,"top_k":40,"repetition_penalty":1.1}))

How some data scientists are expecting to use hugging face:

 from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_name_or_path = "TheBloke/Mistral-7B-Instruct-v0.1-GPTQ"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
 device_map="auto",
 trust_remote_code=False,
 revision="gptq-8bit-32g-actorder_True")

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
pipe = pipeline(
 "text-generation",
 model=model,
 tokenizer=tokenizer,
 max_new_tokens=512,
 do_sample=True,
 temperature=0.7,
 top_p=0.95,
 top_k=40,
 repetition_penalty=1.1
)
model = HuggingFaceModel.from_kwargs(pipeline = pipe)

While the latter needs more lines of code it gives much more control and customisability in the hands of the data scientist in declaring his model. We can add one more kwarg pipeline in which the user can specify the pipeline directly.