autodeployai / pypmml

Python PMML scoring library
Apache License 2.0
75 stars 22 forks source link

Transformation with pyPMML? #38

Open mzlkwsk opened 2 years ago

mzlkwsk commented 2 years ago

Has anyone been able to use a transformation PMML (normalization to be specific) in pyPMML?

I can get the model to load into python just fine and I can print out the input fields, but if I try to "predict" any values or even print out the output fields I get the following error:

"py4j.protocol.Py4JJavaError: An error occurred while calling o0.outputFields. : java.lang.IllegalArgumentException: requirement failed: For the transformedValue result feature, OutputField must contain an EXPRESSION ..."

It seems like python can load the model but then cannot tell what the output is supposed to be.

scorebot commented 2 years ago

@mzlkwsk Could you send your PMML model to me? then I can check where is wrong.

mzlkwsk commented 2 years ago

@scorebot. pmml transformation model.zip

scorebot commented 2 years ago

@mzlkwsk Thanks for your model. It seems it just contains the transformations that look fine, based on the error message, the OutputField takes feature="transformedValue" could have problem. Would you mind sending the complete model to investigate? Thanks.

mzlkwsk commented 2 years ago

@scorebot The transformation "model" is a small part of a Knime workflow that I needed to import into a separate python program. All it is supposed to do is normalize the input variables and I was hoping to export it as a PMML file and load it directly into a python program.

The problem is likely due to the fact that its a normalizer built in Knime, exported as a PMML and then imported into python. I just resorted to using DOM to parse the xml file and get the values necessary to normalize the input.

scorebot commented 2 years ago

@mzlkwsk Sorry, please ignore my comments above. The transformation PMML is a valid model, PyPMML should support it and give correct transformed values when the predict function is invoked. The exception above is caused by a defect in the internal PMML4S library, I will fix it as soon as possible. After it's fixed, you can get the normalized values, and you don't need to compute them manually.

scorebot commented 2 years ago

@mzlkwsk I have fixed the issue. Please, reinstall the latest version from Github by the following command:

pip install --upgrade git+https://github.com/autodeployai/pypmml.git

Please, let me know if you still have a problem.

mzlkwsk commented 2 years ago

@scorebot So the loaded model can now return the Output Names and Fields, and run the transformation without error but the predict method always returns an empty list. This seems to happen regardless of the input list, Ive tried a real, incomplete and empty input list and the predict method always returns an empty list.

scorebot commented 2 years ago

@mzlkwsk I can get correct predicting results, for example:

>>> from pypmml import Model
>>> model = Model.load("gn_t3s_med-MP_full_30s_ta__WS1.xml")
>>> model.inputNames
['tick_num', 'mp_OC_diff', 'mp_med', 'bp_range', 'ap_range', 'bs_range', 'as_range', 'ps_range', 'ss_range', 'BidPrice_Med', 'AskPrice_Med', 'BidSize_Med', 'AskSize_Med', 'price_spread_Med', 'size_spread_Med', 'bp_MAD', 'ap_MAD', 'bs_MAD', 'as_MAD', 'ps_MAD', 'ss_MAD', 'date&time_OC_diff']
>>> model.outputNames
['tick_num*', 'mp_OC_diff*', 'mp_med*', 'bp_range*', 'ap_range*', 'bs_range*', 'as_range*', 'ps_range*', 'ss_range*', 'BidPrice_Med*', 'AskPrice_Med*', 'BidSize_Med*', 'AskSize_Med*', 'price_spread_Med*', 'size_spread_Med*', 'bp_MAD*', 'ap_MAD*', 'bs_MAD*', 'as_MAD*', 'ps_MAD*', 'ss_MAD*', 'date&time_OC_diff*']

>>> model.predict({x: 1.0 for x in model.inputNames})
{'price_spread_Med*': 1.0684161167312247, 'as_range*': -1.2508908484769738, 'date&time_OC_diff*': -2.951057075826117, 'ps_MAD*': 14.25976377866496, 'mp_OC_diff*': 0.981602630869278, 'AskSize_Med*': nan, 'bp_range*': nan, 'bs_range*': -1.20421556926675, 'mp_med*': nan, 'ap_range*': nan, 'ss_range*': -1.4258999103925236, 'bp_MAD*': nan, 'tick_num*': -0.8842088214156572, 'size_spread_Med*': 0.006426391746489668, 'ss_MAD*': -1.2694186614209078, 'ps_range*': 1.709014785792543, 'ap_MAD*': nan, 'BidPrice_Med*': nan, 'bs_MAD*': -0.5971477707650586, 'BidSize_Med*': nan, 'AskPrice_Med*': nan, 'as_MAD*': -0.6436300846519409}

>>> model.predict([1.0 for x in model.inputNames])
[-0.8842088214156572, 0.981602630869278, nan, nan, nan, -1.20421556926675, -1.2508908484769738, 1.709014785792543, -1.4258999103925236, nan, nan, nan, nan, 1.0684161167312247, 0.006426391746489668, nan, nan, -0.5971477707650586, -0.6436300846519409, 14.25976377866496, -1.2694186614209078, -2.951057075826117]

The first predict receives a map of all inputs, then the result is a map too. The second uses a list, and the result is a list too

mzlkwsk commented 2 years ago

@scorebot Something must be wrong on my end because im running the same code as you and its still returning an empty list.

`from pypmml import Model

model = Model.load('/Users/***/Documents/knime data/test/gn_t3s_med-MP_full_30s(ta)_WS1')

test_list = [1.0 for x in model.inputNames]

norm_list = model.predict(test_list)

print(test_list) print(norm_list)`

returns:

[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] []

Is there something really wrong with my syntax? Im pretty new to python.

scorebot commented 2 years ago

You probably need to clean your environment. Firstly, check your location of installed pypmml, for example:

pip show pypmml
Name: pypmml
Version: 0.9.13
Summary: Python PMML scoring library
Home-page: https://github.com/autodeployai/pypmml
Author: AutoDeployAI
Author-email: autodeploy.ai@gmail.com
License: Apache License 2.0
Location: /Users/scorebot/anaconda3/lib/python3.7/site-packages
Requires: py4j
Required-by: daas-client

Uninstall the pypmml by pip uninstall pypmml, then check the location above like /Users/scorebot/anaconda3/lib/python3.7/site-packages/pypmml if something not deleted, remove the directory pypmml. Finally, install the latest pypmml by the command above

mzlkwsk commented 2 years ago

Still no joy, I uninstalled pypmml via pip and the folder was removed from my environments site packages folder. I reinstalled using the github upgrade line and its still returning an empty list.

Is it possible something else is causing the issue? Im running this out of a miniforge environment, with python 3.9 on a M1 mac book pro, could any of those be an issue?

scorebot commented 2 years ago

@mzlkwsk Sorry, I still can not reproduce your issue, maybe it's related to your environment, I'm not sure.

SJTULLY commented 2 years ago

me too, the predict method always returns an empty list.

SJTULLY commented 2 years ago

@scorebot我的结果一定有问题,因为我运行与您相同的代码并且它仍然返回一个空列表。

`从 pypmml 导入模型

model = Model.load('/Users/***/Documents/knime data/test/gn_t3s_med-MP_full_30s(ta)_WS1')

test_list = [1.0 for x in model.inputNames]

norm_list = model.predict(test_list)

打印(测试列表) 打印(规范列表)`

返回:

[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] []

我的语法真的有问题吗?我对python很陌生。

Me too,the same return

scorebot commented 2 years ago

@mzlkwsk @SJTULLY Could you check the JAVA if its version is 16+? If yes, please downgrade to less than 16, we usually use the JAVA 8 and 11 to test. pypmml depends on py4j to leverage pmml4s that is in Scala on JVM, but py4j has issues on the support JAVA 16+, see the related issue: https://github.com/py4j/py4j/issues/485