llama.cpp on spark - Githubissues

Hi djl developers, I'd like to combine the image classification on spark demo and chatbot demo to build llama.cpp on spark, following the design and conventions in djl spark extension.

I find in both image cls on spark demo and djl spark extension code, the core part is like df.mapPartitions(transformRowsFunc), with a model loading process like modelLoader.newPredictor() in transformRowsFunc`. It seems the model would be loaded for one partition and then released, and loaded and released for another.

I test loading resnet 50, that takes about 0.3s, which is totally acceptable for predicting an amount of images in a partition. However, for llama model, (e.g., llama-2-7b-chat.Q5_K_S.gguf 4.65G), it takes 13s for each loading process and seems too long and somehow redundant.

Is there any solution to reduce or minimize this model loading time on spark?

deepjavalibrary / djl-demo

llama.cpp on spark #438