Closed wjtdc closed 3 years ago
[
](url)
@wjtdc Thanks for your findings, what is the type of model? RegressionModel
or GeneralRegressionModel
? Could you mind sending your model to me for debugging?
Here is the model. It is LogisticRegression
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<PMML xmlns="http://www.dmg.org/PMML-4_4" xmlns:data="http://jpmml.org/jpmml-model/InlineTable" version="4.4">
<Header>
<Application name="JPMML-SkLearn" version="1.6.7"/>
<Timestamp>2021-04-27T18:21:13Z</Timestamp>
</Header>
<MiningBuildTask>
<Extension>PMMLPipeline(steps=[('mapping', DataFrameMapper(default=False, df_out=False, drop_cols=[],
features=[(['price'],
StandardScaler(copy=True, with_mean=True,
with_std=True)),
(['lotsize'],
StandardScaler(copy=True, with_mean=True,
with_std=True)),
(['bedrooms'],
StandardScaler(copy=True, with_mean=True,
with_std=True)),
(['bathrms'],
StandardScaler(copy=True, with_mean=True,
with_std=True)),
(['stories'],
StandardScaler(copy=True, with_mean=True,
with_std=True)),
(['garagepl'],
StandardScaler(copy=True, with_mean=True,
with_std=True)),
(['driveway'], LabelEncoder()),
(['recroom'], LabelEncoder()),
(['fullbase'], LabelEncoder()),
(['gashw'], LabelEncoder()),
(['airco'], LabelEncoder()),
(['prefarea'], LabelEncoder())],
input_df=False, sparse=False)),
('clf', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, l1_ratio=None, max_iter=1000,
multi_class='ovr', n_jobs=None, penalty='l2', random_state=0,
solver='liblinear', tol=0.0001, verbose=0, warm_start=False))])</Extension>
</MiningBuildTask>
<DataDictionary>
<DataField name="homestyle" optype="categorical" dataType="string">
<Value value="classic"/>
<Value value="eclectic"/>
</DataField>
<DataField name="price" optype="continuous" dataType="double"/>
<DataField name="lotsize" optype="continuous" dataType="double"/>
<DataField name="bedrooms" optype="continuous" dataType="double"/>
<DataField name="bathrms" optype="continuous" dataType="double"/>
<DataField name="stories" optype="continuous" dataType="double"/>
<DataField name="garagepl" optype="continuous" dataType="double"/>
<DataField name="driveway" optype="categorical" dataType="string">
<Value value="no"/>
<Value value="yes"/>
</DataField>
<DataField name="recroom" optype="categorical" dataType="string">
<Value value="no"/>
<Value value="yes"/>
</DataField>
<DataField name="fullbase" optype="categorical" dataType="string">
<Value value="no"/>
<Value value="yes"/>
</DataField>
<DataField name="gashw" optype="categorical" dataType="string">
<Value value="no"/>
<Value value="yes"/>
</DataField>
<DataField name="airco" optype="categorical" dataType="string">
<Value value="no"/>
<Value value="yes"/>
</DataField>
<DataField name="prefarea" optype="categorical" dataType="string">
<Value value="no"/>
<Value value="yes"/>
</DataField>
</DataDictionary>
<TransformationDictionary/>
<RegressionModel functionName="classification" normalizationMethod="logit">
<MiningSchema>
<MiningField name="homestyle" usageType="target"/>
<MiningField name="price"/>
<MiningField name="lotsize"/>
<MiningField name="bedrooms"/>
<MiningField name="bathrms"/>
<MiningField name="stories"/>
<MiningField name="garagepl"/>
<MiningField name="driveway"/>
<MiningField name="recroom"/>
<MiningField name="fullbase"/>
<MiningField name="gashw"/>
<MiningField name="airco"/>
<MiningField name="prefarea"/>
</MiningSchema>
<Output>
<OutputField name="probability(classic)" optype="continuous" dataType="double" feature="probability" value="classic"/>
<OutputField name="probability(eclectic)" optype="continuous" dataType="double" feature="probability" value="eclectic"/>
</Output>
<LocalTransformations>
<DerivedField name="standardScaler(price)" optype="continuous" dataType="double">
<Apply function="/">
<Apply function="-">
<FieldRef field="price"/>
<Constant dataType="double">61119.59862385321</Constant>
</Apply>
<Constant dataType="double">17482.07526012927</Constant>
</Apply>
</DerivedField>
<DerivedField name="standardScaler(lotsize)" optype="continuous" dataType="double">
<Apply function="/">
<Apply function="-">
<FieldRef field="lotsize"/>
<Constant dataType="double">4926.034403669725</Constant>
</Apply>
<Constant dataType="double">2066.0385888582164</Constant>
</Apply>
</DerivedField>
<DerivedField name="standardScaler(bedrooms)" optype="continuous" dataType="double">
<Apply function="/">
<Apply function="-">
<FieldRef field="bedrooms"/>
<Constant dataType="double">2.903669724770642</Constant>
</Apply>
<Constant dataType="double">0.719891251408192</Constant>
</Apply>
</DerivedField>
<DerivedField name="standardScaler(bathrms)" optype="continuous" dataType="double">
<Apply function="/">
<Apply function="-">
<FieldRef field="bathrms"/>
<Constant dataType="double">1.2178899082568808</Constant>
</Apply>
<Constant dataType="double">0.4397154488452761</Constant>
</Apply>
</DerivedField>
<DerivedField name="standardScaler(stories)" optype="continuous" dataType="double">
<Apply function="/">
<Apply function="-">
<FieldRef field="stories"/>
<Constant dataType="double">1.6926605504587156</Constant>
</Apply>
<Constant dataType="double">0.7614817065177371</Constant>
</Apply>
</DerivedField>
<DerivedField name="standardScaler(garagepl)" optype="continuous" dataType="double">
<Apply function="/">
<Apply function="-">
<FieldRef field="garagepl"/>
<Constant dataType="double">0.5986238532110092</Constant>
</Apply>
<Constant dataType="double">0.8277846724953176</Constant>
</Apply>
</DerivedField>
<DerivedField name="encoder(driveway)" optype="categorical" dataType="integer">
<MapValues outputColumn="data:output">
<FieldColumnPair field="driveway" column="data:input"/>
<InlineTable>
<row>
<data:input>no</data:input>
<data:output>0</data:output>
</row>
<row>
<data:input>yes</data:input>
<data:output>1</data:output>
</row>
</InlineTable>
</MapValues>
</DerivedField>
<DerivedField name="encoder(recroom)" optype="categorical" dataType="integer">
<MapValues outputColumn="data:output">
<FieldColumnPair field="recroom" column="data:input"/>
<InlineTable>
<row>
<data:input>no</data:input>
<data:output>0</data:output>
</row>
<row>
<data:input>yes</data:input>
<data:output>1</data:output>
</row>
</InlineTable>
</MapValues>
</DerivedField>
<DerivedField name="encoder(fullbase)" optype="categorical" dataType="integer">
<MapValues outputColumn="data:output">
<FieldColumnPair field="fullbase" column="data:input"/>
<InlineTable>
<row>
<data:input>no</data:input>
<data:output>0</data:output>
</row>
<row>
<data:input>yes</data:input>
<data:output>1</data:output>
</row>
</InlineTable>
</MapValues>
</DerivedField>
<DerivedField name="encoder(gashw)" optype="categorical" dataType="integer">
<MapValues outputColumn="data:output">
<FieldColumnPair field="gashw" column="data:input"/>
<InlineTable>
<row>
<data:input>no</data:input>
<data:output>0</data:output>
</row>
<row>
<data:input>yes</data:input>
<data:output>1</data:output>
</row>
</InlineTable>
</MapValues>
</DerivedField>
<DerivedField name="encoder(airco)" optype="categorical" dataType="integer">
<MapValues outputColumn="data:output">
<FieldColumnPair field="airco" column="data:input"/>
<InlineTable>
<row>
<data:input>no</data:input>
<data:output>0</data:output>
</row>
<row>
<data:input>yes</data:input>
<data:output>1</data:output>
</row>
</InlineTable>
</MapValues>
</DerivedField>
<DerivedField name="encoder(prefarea)" optype="categorical" dataType="integer">
<MapValues outputColumn="data:output">
<FieldColumnPair field="prefarea" column="data:input"/>
<InlineTable>
<row>
<data:input>no</data:input>
<data:output>0</data:output>
</row>
<row>
<data:input>yes</data:input>
<data:output>1</data:output>
</row>
</InlineTable>
</MapValues>
</DerivedField>
<DerivedField name="continuous(encoder(driveway))" optype="continuous" dataType="integer">
<FieldRef field="encoder(driveway)"/>
</DerivedField>
<DerivedField name="continuous(encoder(recroom))" optype="continuous" dataType="integer">
<FieldRef field="encoder(recroom)"/>
</DerivedField>
<DerivedField name="continuous(encoder(fullbase))" optype="continuous" dataType="integer">
<FieldRef field="encoder(fullbase)"/>
</DerivedField>
<DerivedField name="continuous(encoder(gashw))" optype="continuous" dataType="integer">
<FieldRef field="encoder(gashw)"/>
</DerivedField>
<DerivedField name="continuous(encoder(airco))" optype="continuous" dataType="integer">
<FieldRef field="encoder(airco)"/>
</DerivedField>
<DerivedField name="continuous(encoder(prefarea))" optype="continuous" dataType="integer">
<FieldRef field="encoder(prefarea)"/>
</DerivedField>
</LocalTransformations>
<RegressionTable intercept="2.4434261415131577" targetCategory="eclectic">
<NumericPredictor name="standardScaler(price)" coefficient="5.665174186447512"/>
<NumericPredictor name="standardScaler(lotsize)" coefficient="0.03268595140978323"/>
<NumericPredictor name="standardScaler(bedrooms)" coefficient="-0.14034552993860838"/>
<NumericPredictor name="standardScaler(bathrms)" coefficient="0.4693817075890175"/>
<NumericPredictor name="standardScaler(stories)" coefficient="-0.0751176717450174"/>
<NumericPredictor name="standardScaler(garagepl)" coefficient="0.01347471331749156"/>
<NumericPredictor name="continuous(encoder(driveway))" coefficient="0.7588759495896155"/>
<NumericPredictor name="continuous(encoder(recroom))" coefficient="-0.0781787031939792"/>
<NumericPredictor name="continuous(encoder(fullbase))" coefficient="0.6193262751903754"/>
<NumericPredictor name="continuous(encoder(gashw))" coefficient="0.8551064661895064"/>
<NumericPredictor name="continuous(encoder(airco))" coefficient="0.37834066142531536"/>
<NumericPredictor name="continuous(encoder(prefarea))" coefficient="-0.046869891091957175"/>
</RegressionTable>
<RegressionTable intercept="0.0" targetCategory="classic"/>
</RegressionModel>
</PMML>
Yes, this is very true, I just run the simple test with the model above and it gives me the NaNs sometimes in a mutithreading environment. Here is the source code for my test:
`
public static void main(String[] args) {
Model m = Model.fromFile("model.pmml");
Map<String, Object> features = new HashMap<String, Object>();
features.put("price", 1);
features.put("lotsize", 1);
features.put("bedrooms", 1);
features.put("bathrms", 1);
features.put("stories", 1);
features.put("garagepl", 1);
features.put("driveway", "yes");
features.put("recroom", "yes");
features.put("fullbase", "yes");
features.put("gashw", "yes");
features.put("airco", "yes");
features.put("prefarea", "yes");
for (int i = 0; i < 10; i++) {
final int threadNum = i;
new Thread(new Runnable() {
long l = 0;
@Override
public void run() {
while (true) {
Map<String, Object> map = m.predict(features);
System.out.println("Thread " + threadNum + "; row num: " + l++ + "; Score claassic: "
+ map.get("probability(classic)") + "; Score eclectic: "
+ map.get("probability(eclectic)"));
}
}
}).start();
}
}`
And here is the sample output:
Thread 6; row num: 138; Score claassic: 0.9999996034045886; Score eclectic: 3.96595411369589E-7
Thread 5; row num: 113; Score claassic: NaN; Score eclectic: NaN
Thread 6; row num: 139; Score claassic: 0.9999996034045886; Score eclectic: 3.96595411369589E-7
And to avoid confusion - me, @soumyava and @wjtdc - we are working on a same project
Thanks for your model and example, we can reproduce the critical thread-safe issue that is caused by the derived field computing, we will fix it as soon as possible.
Thank you, @scorebot
Thank you for looking into this issue.
Yes, @soumyava and I work on same project.
We have fixed the thread-safe issue above. Please, clone the latest code of the master branch, build PMML4S by the command: sbt package
, then try the new jar. If there is no problem anymore, we will release the next version 0.9.10
to the maven central. Please, let me know if you have any problems.
Hi @scorebot I have been able to use the jar and do not see the NaNs anymore. Please go ahead and release the next version to maven central and I look forward to picking up 0.9.10 from Maven central. Thank you for fixing the issue in such a short span of time. My colleagues and I really appreciate it !
Thanks @scorebot for quick fix.
@soumyava The latest version 0.9.10
has been pushed to the Maven central, please try it.
I close this issue now. if you have other problems, please feel free to open a new one.
Teradata (https://www.teradata.com/) team is using the following PMML4S library (java) to develop prediction function.
for probabilities:
` sn prediction json_report
`