aiplanethub / genai-stack

An End to End GenAI Framework
https://genaistack.aiplanet.com/
Apache License 2.0
128 stars 44 forks source link

Enable Pipeline in Hugging Face models #90

Closed sam-aiplanet closed 1 year ago

sam-aiplanet commented 1 year ago

We need a functionality to use pipelines directly in the HuggingFace model. Lot of DataScientists are very well comfortable with declaring pipelines from Hugging face directly instead of passing it through model_kwargs and pipeline_kwargs which makes it too confusing for them:

How we are currently building pipelines:

llm = HuggingFaceModel.from_kwargs(model=model_name_or_path,
 model_kwargs={"device_map":"cuda","quantization_config":quantization_config,"trust_remote_code":"False","low_cpu_mem_usage":"True"},
 task='text-generation',
 pipeline_kwargs=({"max_new_tokens":512,"do_sample":"True","temperature":0.7,"top_p":0.95,"top_k":40,"repetition_penalty":1.1}))

How some data scientists are expecting to use hugging face:

 from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_name_or_path = "TheBloke/Mistral-7B-Instruct-v0.1-GPTQ"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
 device_map="auto",
 trust_remote_code=False,
 revision="gptq-8bit-32g-actorder_True")

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
pipe = pipeline(
 "text-generation",
 model=model,
 tokenizer=tokenizer,
 max_new_tokens=512,
 do_sample=True,
 temperature=0.7,
 top_p=0.95,
 top_k=40,
 repetition_penalty=1.1
)
model = HuggingFaceModel.from_kwargs(pipeline = pipe)

While the latter needs more lines of code it gives much more control and customisability in the hands of the data scientist in declaring his model. We can add one more kwarg pipeline in which the user can specify the pipeline directly.