Kinda self explanatory from the title, right now each python version for a given target builds llama.cpp independently. This artificially limits how many platforms we can support by blowing up ci build times.
Since we aren't actually linking against the python api at all each python version on any given platform is essentially building the same llama.cpp shared library. If we can cache or re-use a single pre-built library this should speed up ci build times significantly.
Kinda self explanatory from the title, right now each python version for a given target builds llama.cpp independently. This artificially limits how many platforms we can support by blowing up ci build times.
Since we aren't actually linking against the python api at all each python version on any given platform is essentially building the same llama.cpp shared library. If we can cache or re-use a single pre-built library this should speed up ci build times significantly.