Closed chadbrewbaker closed 7 months ago
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition - seems you can compose small low rank fine-tunes.
This issue was closed because it has been inactive for 14 days since being marked as stale.
Feature Description
This paper came out a few days ago: LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning - GitHub.
Quantize each matrix tile to best precision, use few high precision vectors to approximate matrix for fine-tuning, and further data aware optimizations.
Motivation
Push boundary of what can be hosted in the browser and on consumer hardware. Data-aware tuning. Linux sandbox as oracle.
Possible Implementation
[ ] Test rig around original Python code. [ ] Data structure to store quantized tiles and vector approximations. @clattner mentioned something about "bubbles"? [ ] QEMU tests - JSLinux sandbox in web browser examples.
Nice to have:
[ ] Implement @geohot bounty to compare llama.cpp w TinyGrad to diff roundoff errors so we aren't flying blind. [ ] Data-aware tuning. Fisher as in paper, tuning float intrinsics on value bounded data, linting values that should be constant. [ ] Weightwatcher (Empirical Spectral Density) plots to compare models. Ideally an entire Jupyter notebook of charts to compare two models at a glance. [ ] Fuzzing like Berger'z COZ to elicit bottlenecks - low bits of tiles, tile precision size, add delay in operations, increase buffer sizes, lower cgroup permissions. [ ] Tile size autotune benchmark and general profile guided optimization. Probably add Mojo target in Makefile for comparison. [ ] Scheduler auto-tuning for large multicore CPU. Embedded Linux config files to lower jitter like IBM Blue Gene/L. [ ] Zstd dictionary to compact idle data structures - compression ratio also useful for performance linting. [ ] Quadtree of high precision tiles? Two-pass might be faster, do everything in low precision then second sparse pass w high precision. Perhaps use Ultra Fast BERT - Exponentially Faster Language Modeling to look at sparse tradeoffs. [] Q* pass - given a "page" of corpus ask probing questions about it and add that feedback to the training data.