Closed zeryx closed 1 year ago
@aslisabanci I think both approaches can work, and the subprocess can be simpler - but it runs into performance issues as it shuts down and starts back up the JVM, and requires reloading the model on each call. This has been something most of our DR customers have asked for is improved performance, which is why we utilise the py4j system. If there's an easier way to manage the apply function and simplify the IO that would be awesome
@aslisabanci I think both approaches can work, and the subprocess can be simpler - but it runs into performance issues as it shuts down and starts back up the JVM, and requires reloading the model on each call. This has been something most of our DR customers have asked for is improved performance, which is why we utilise the py4j system. If there's an easier way to manage the apply function and simplify the IO that would be awesome
I don't object to the perf aspect, but sometimes the user won't know how to use the "codegen + monitoring" jar using py4j. This jar is invoked with certain parameters and since we don't know the internals of this Java package's implementation, so we won't know how to write the py4j wrapper. Taking my example in this thread, I don't know how I could call this jar package with these parameters using py4j:
java -jar <local path to scoring code jar> csv \
--input=<local path to input CSV> \
--output=<local path to output CSV> \
--enable_mlops \
--dr_token=<your api token>
So for me, it wasn't a matter of preference, but a necessity to call this jar using subprocess.
We can keep the template like this if it fits the majority of our interested users' way of using the codegens, but I wanted us to be aware of the other use cases.
AML-10 - Datarobot Java CodeGen
We're creating a simple template that utilizes a publicly available model file that uses.
Checklist