deepjavalibrary / djl-demo

Demo applications showcasing DJL
https://demo.djl.ai
Apache License 2.0
298 stars 121 forks source link

llama.cpp on spark #438

Closed lslslslslslslslslslsls closed 3 months ago

lslslslslslslslslslsls commented 3 months ago

Hi djl developers, I'd like to combine the image classification on spark demo and chatbot demo to build llama.cpp on spark, following the design and conventions in djl spark extension.

I find in both image cls on spark demo and djl spark extension code, the core part is like df.mapPartitions(transformRowsFunc), with a model loading process like modelLoader.newPredictor() in transformRowsFunc`. It seems the model would be loaded for one partition and then released, and loaded and released for another.

I test loading resnet 50, that takes about 0.3s, which is totally acceptable for predicting an amount of images in a partition. However, for llama model, (e.g., llama-2-7b-chat.Q5_K_S.gguf 4.65G), it takes 13s for each loading process and seems too long and somehow redundant.

Is there any solution to reduce or minimize this model loading time on spark?

zachgk commented 3 months ago

Duplicate of https://github.com/deepjavalibrary/djl/issues/3079