Closed zackyenchik closed 2 weeks ago
@zackyenchik Can you please share your model and those 5 lines' datasets with me? You can send them to scorebot#outlook.com
, thanks.
@zackyenchik Thanks for your model and dataset, I can reproduce the issue, which is caused by the input dataset not matching the model completely, for example, in all those boolean fields, the dataset contains values like "True" or "False", but the model expects their values should be 1.0 or 0.0 because they were defined as the following format as double in the PMM model:
<DataField name="bot_reference" optype="continuous" dataType="double"/>
The scoring library can't convert those values successfully, so all those values were treated as missing, that's the reason why the incorrect results were returned.
You will get the same results if you convert those values "True"/"False" to 1.0/0.0. we will also enhance the utility of data conversion in PMML4S to handle the case automatically.
Ah that makes sense. Thank you for the quick follow up!
Hello! I have a PMML model trained with scikit-learn and extracted to PMML with sklearn2pmml. For some reason, the model is scoring differently between Python/jpmml and pmml4s:
Python PMML 0.48976458967678604 0.48976458967678604 0.7660308225499471 0.7660308225499471 0.38325820040056524 0.38325820040056524 0.38607212482501463 0.38607212482501463 0.49769546260665454 0.49769546260665454
pmml4s 0.26909560427731416 0.24756982049868986 0.24974076556763675 0.254523400551614 0.18821498178901241
Some code snippets that may be helpful:
Python library versions: dill==0.3.8 joblib==1.4.2 jpmml_evaluator==0.10.2 JPype1==1.5.0 numpy==1.26.4 packaging==24.1 pandas==2.2.2 py4j==0.10.9.7 pyjnius==1.6.1 python-dateutil==2.9.0.post0 pytz==2024.1 scikit-learn==1.5.0 scipy==1.14.0 setuptools==70.1.1 six==1.16.0 sklearn-pandas==2.2.0 sklearn2pmml==0.109.0 threadpoolctl==3.5.0 tzdata==2024.1
Java classpath: opencsv-3.10 pmml4s_3-1.0.1 scala3-library_3-3.4.0 scala-library-2.13.12 spray-json_3-1.3.6
Java program used to test pmml4s:
I'd be happy to share the PMML model as well but it doesn't look like I can attach it here. Let me know if there's anything else you need from me to sort this out! Thank you in advance!