Closed scorebot closed 5 years ago
Hi @scorebot, thanks for your feedback. Answers to your questions are -
For question 1 and 2, I will go through it and get back to you. Thanks!
Hi @scorebot,
1) The reason for the misprediction is the operator in the SimplePredicate. Instead of lessThan
and greaterOrEqual
, it should be lessOrEqual
and greaterThan
.
2) Yes, wordSeparatorCharacterRE="(?u)\b\w\w+\b"
is not correct. The default value should be \s
as you mentioned.
Thanks for pointing out these issues. These will be resolved in the next release of Nyoka.
@nyoka-pmml @Nirmal-Neel Thanks for your answers. I'm glad these problems can be fixed in the next release.
I can see there is already a draft page about AnomalyDetectionModel on the SourceForge PMML Project, but I don't find any page about DeepNetwork, is there any document about such PMML standard candidate?
@scorebot , Yes, you will not find any draft page for DeepNetwork since it is not part of the latest schema. If you need, I can provide you some information which is currently used internally.
@nyoka-pmml I will appreciate it if you can provide some info about DeepNetwork. I plan to implement both new models in my open source PMML scoring libraries.
For DeepNetwork schema, please refer pmml44.xsd
DeepNetwork is the root element for a DeepNet model. Its child element is NetworkLayer, which contains information of each layer of the model.
The layerId
attribute of NetworkLayer is unique for each layer and connectionLayerId
(some layer's layerId
) creates a link to the connected layer. Each NetworkLayer element has three child elements - LayerParameters, LayerWeights and LayerBias. LayerParameters has attributes for each layers (for the list of attributes, you can refer to the schema). LayerWeights and LayerBias hold the layer's weight and bias information which is represented in base64 string format. The format is - data:float32;base64,tbQ0P1JAQj+f4hI/Yt7OPmwCpD60eR8/MfycPpEy8D4=
. The base64 string should be encoded in __LITTLE_ENDIAN__ order and it is prepended by data:(float32|float64);base64,
.
(In LayerParameters, for inputDimesion
and outputDimension
, the batch size is not included.)
Hi @scorebot,
1. The reason for the misprediction is the operator in the SimplePredicate. Instead of `lessThan` and `greaterOrEqual`, it should be `lessOrEqual` and `greaterThan`. 2. Yes, `wordSeparatorCharacterRE="(?u)\b\w\w+\b"` is not correct. The default value should be `\s` as you mentioned.
Thanks for pointing out these issues. These will be resolved in the next release of Nyoka.
These are resolved and released in Nyoka 3.3.0
Hi guys, Thanks for the awesome project. I have some questions about the models produced by the example notebooks, I use the PyPMML to test those models.
lgbmr_pmml_preprocess.pmml
exported bynyoka/examples/lgbm/3_lgbm_With_PreProcess .ipynb
. Open the notebook, add the following two cells at the end:Make prediction against
x_test
using the built pipeline model, the predicted value is 23.51497829 for the frist recordLoad model by PyPMML, then make prediction again, the first predicted value is 22.055896. Both values are different, while they are expected identical to each other.
Then, I tried to debug the case, I found there were two potential problems about the exported PMML, I need to confirm with you if they are real problems. Please, correct me if there are something wrong.
The attribute
wordSeparatorCharacterRE
of TextIndex, and(?u)\b\w\w+\b
is used for all. As describe in DMGthe wordSeparatorCharacterRE attribute can be used to pass a regular expression containing possible word separator characters
, when the(?u)\b\w\w+\b
is applied to all related derived fields, all values are evaluated as 0.0, e.g. the first test record is{"car name": "ford pinto", "displacement": 122.0}
, to evaluate the derived fieldcount_vec@[car name](ford)
, the const term "ford" is split into "", the input value "ford pinto" split to two empty string. Please, check if the value(?u)\b\w\w+\b
is suitable here.Then I modified the PMML, change all
(?u)\b\w\w+\b
to the default value\s
. Now, I think the values of derived fields are fine, but the final result is still the original value 22.055896, I checked those ensembly trees, take the fieldcount_vec@[car name](ford)
as an example again, all tree nodes use it in such case:The first node will be never used, and the second node is always hit, so the field
car name
should be useless, I change the car name to any string, the evaluated value is still 22.055896. Could you check if it's desired?dtr_pmml.pmml
exported bynyoka/examples/skl/5_Decision_Tree_With_Tf-Idf.ipynb
, it still has the same issue that the attributewordSeparatorCharacterRE
takes(?u)\b\w\w+\b
.rf_pmml.pmml
exported bynyoka/examples/skl/3_RF_With_pre-processing.ipynb
, there is an output fieldpredicted_Species
:Its data type is string, but I think it should be integer that matches its integer target Species.
OneClassSVM_model.pmml
exported bynyoka/examples/skl/OneClassSVM_model.pmml
, it's a AnomalyDetectionModel with version 4.4, will it be a standard model of 4.4?Both
2classMBNet.pmml
andsequentialModel.pmml
of Keras models, they use the new model type DeepNetwork with 4.4, will it be a part of PMML 4.4?