conceptofmind / toolformer

MIT License
343 stars 38 forks source link

Optimization #19

Open conceptofmind opened 1 year ago

conceptofmind commented 1 year ago

I need to optimize every tool that uses a huggingface model. Such as NMT. Maybe kernl to replace graphs with torch jit or flash attention. Inference speed is key for these.

Investigate faster transformer and triton inference server as well.

conceptofmind commented 1 year ago

Lora + DeepSpeed + Flash Attention + maybe 8 bit

conceptofmind commented 1 year ago

Just gonna do gptq