The predict_batch_udf introduces standardized code for:
Translating Spark DataFrames into NumPy arrays, so the end-user DL inferencing code does not need to convert from a Pandas DataFrame.
Batching the incoming NumPy arrays for the DL frameworks.
Model loading on the executors, which avoids any model serialization issues, while leveraging the Spark spark.python.worker.reuse configuration to cache models in the Spark executors.
See https://spark.apache.org/docs/3.4.0/api/python/reference/api/pyspark.ml.functions.predict_batch_udf.html and https://developer.nvidia.com/blog/distributed-deep-learning-made-easy-with-spark-3-4/. The current implementation seems to be subject to issues below.