Jason lo/issue35 - Githubissues

Achieve around 10x performance with these optimizations:

Upgrade to use the vllm engine (flash attention backend) for batch inference (initial batch size = 100, higher may be faster but risks OOM and diminishing returns).
Optimize regex preprocessing for faster detection of known entities.
Run post-processing on GPU instead of CPU. Initially, I avoided loading two models on the GPU, but the all-MiniLM-L6-v2 responsible for entity alignment is small enough not to impact performance significantly. Running both on GPU greatly improves speed.

New ETA for completing all processing: May 10.

UW-xDD / text2graph_llm