Open spookrunner opened 1 year ago
any updates here on that?
@spookrunner @arabe91 would you elaborate please? Would you like to see the model served as API to provide pairwise predictions, or something else?
@robertwhiffin yes exactly...enabling the code to be registered as a formal model and then inferred as such (either via batch or API) rather than embedding the code directly in a workflow somewhere
This is a good idea. It will be added to the backlog. Thanks!
On Thu, 14 Dec 2023 at 17:49, spookrunner @.***> wrote:
@robertwhiffin https://github.com/robertwhiffin yes exactly...enabling the code to be registered as a formal model and then inferred as such (either via batch or API) rather than embedding the code directly in a workflow somewhere
— Reply to this email directly, view it on GitHub https://github.com/databricks-industry-solutions/auto-data-linkage/issues/61#issuecomment-1856319508, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATYH4EEY5USQY7LAGW7BEGDYJM325AVCNFSM6AAAAAA3DP7NS2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJWGMYTSNJQHA . You are receiving this because you were mentioned.Message ID: <databricks-industry-solutions/auto-data-linkage/issues/61/1856319508@ github.com>
Are there any updates on this issue?
I was able to get the arc training up and running in a day, but I am stuck on how to retrieve a trained model from MLFlow and make predictions on new datasets for the same schema. The documentation and examples seem to be focused on training the model.
Is there an example to follow to retrieve a previously logged model and make predictions using the existing code?
This is a WIP - the model can currently be retrieved from MLFlow and deployed, but it's not pretty. The next version will make this nicer. Something like this should work
import mlflow
import arc
from splink.spark.linker import SparkLinker
logged_model = 'runs:/d540c25de5f342db80ff7e8ceb512bff/linker'
# Load model as a PyFuncModel.
loaded_model = mlflow.pyfunc.load_model(logged_model)
arc_linker = loaded_model.unwrap_python_model()
linker = SparkLinker(link_data, spark=spark)
linker.load_settings(arc_linker.settings)
predictions = linker.predict()
the current predict method doesn't work, so we need to extract the settings which define the underlying splink model and build a splink linker.
API support will be a longer term issue
@robertwhiffin Thanks! This is what I needed, I was getting stuck because the current SplinkMLFlowWrapper.predict
throws an error, I think because it is using the deprecated linker.initialise_settings
instead of linker.load_settings
.
Looking forward to the next version!
Has there been any consideration about adding support for inference via a registered model on Databricks?