Closed fnc11 closed 2 years ago
@fnc11 I can generate the model svm_SD.pmml
, but I can not reproduce the issue of Java based on the latest code, here I used the Scala API that should be same as Java:
val model = Model.fromFile("svm_SD.pmml")
val src = Source.fromFile("x_test.csv")
val iter = src.getLines().drop(1).map(_.split(",")).toList
val result = iter.map(x => {
model.predict(x)(0)
})
import java.nio.file.{Paths, Files}
import java.nio.charset.{StandardCharsets}
Files.write(Paths.get("java_predicted_labels.csv"), ("prediction" :: result.map(_.toString)).mkString("\n").getBytes(StandardCharsets.UTF_8))
Then I load the predictions of the file java_predicted_labels.csv
in Python to compare:
java_predicted_labels = pd.read_csv('java_predicted_labels.csv')
java_predicted_labels = java_predicted_labels.iloc[:, 0].tolist()
conflict_ids = list()
for i, bf_label, java_label, actual_label in zip(list(range(len(bf_predicted_labels))), bf_predicted_labels, java_predicted_labels, y_test):
if bf_label != java_label:
conflict_ids.append(i)
print(len(conflict_ids))
print(conflict_ids)
0
[]
It could be caused by the old version of PMML4S used, sorry I just pushed the latest version 0.9.13
to the Maven repository. Could you try it?
@fnc11 Does the latest 0.9.13
work for you?
Dear @scorebot,
I have updated the version number in dependencies but still the model score value didn't change so I think it didn't get fixed. I am attaching more code from my Java implementation, can you try with this?
Double score = getModelScoreWithFeatures("src/main/resources/test_ft_seqs.csv");
System.out.println(score);
Double getModelScoreWithFeatures(String fileName) {
List<FeatureSequence> ftSeqs = readCSVSeqs(fileName);
System.out.println(ftSeqs.get(0));
List<Integer> groundTruthLabels = new ArrayList<>();
List<Double[]> fts = new ArrayList<>();
for(FeatureSequence featureSequence: ftSeqs){
fts.add(featureSequence.features);
groundTruthLabels.add(featureSequence.label);
}
List<Integer> predictedLabels = getBatchPredictions(fts);
savePredictions(predictedLabels, "src/main/resources/java_predicted_labels.csv");
int correct = 0;
int allSeqs = fts.size();
for(int i=0;i<allSeqs;i++){
if(Objects.equals(groundTruthLabels.get(i), predictedLabels.get(i))){
correct++;
}
}
return 100*((double)correct/allSeqs);
}
List<FeatureSequence> readCSVSeqs(String fileName) {
List<FeatureSequence> ftSeqs = new ArrayList<>();
CSVParser parser = new CSVParserBuilder()
.withSeparator(',')
.withFieldAsNull(CSVReaderNullFieldIndicator.EMPTY_QUOTES)
.withIgnoreLeadingWhiteSpace(true)
.build();
try {
CSVReader csvReader = new CSVReaderBuilder(new FileReader(fileName))
.withSkipLines(1)
.withCSVParser(parser)
.build();
// read all records at once
List<String[]> records = csvReader.readAll();
// iterate through list of records
for (String[] record : records) {
if (record.length > 0) {
Double[] dFeatures = new Double[6];
// System.out.println(record[1]);
String[] features = record[0].replace('[', ' ').replace(']', ' ').split("\\s+");
List<String> validFeatures = new ArrayList<>();
for(String feature: features){
if (!feature.equals("")){
validFeatures.add(feature);
}
}
for (int i = 0; i < 6; i++) {
dFeatures[i] = Double.parseDouble(validFeatures.get(i));
}
FeatureSequence ftSeq = new FeatureSequence(dFeatures, Integer.parseInt(record[1]));
ftSeqs.add(ftSeq);
}
}
} catch (IOException | CsvException e) {
e.printStackTrace();
}
return ftSeqs;
}
List<Integer> getBatchPredictions(List<Double[]> ftSeqs) {
List<Integer> predictedLabels = new ArrayList<>();
for (Double[] ftSeq : ftSeqs) {
Object[] result = model.predict(ftSeq);
int predictedLabel = ((Long) result[0]).intValue();
predictedLabels.add(predictedLabel);
}
return predictedLabels;
}
void savePredictions(List<Integer> predictedLabels, String fileName) {
CSVWriter writer = null;
try {
writer = new CSVWriter(new FileWriter(fileName));
List<String[]> lines = convertToStringArrary(predictedLabels);
for (String[] line : lines) {
writer.writeNext(line);
}
writer.close();
} catch (IOException e) {
e.printStackTrace();
}
}
List<String[]> convertToStringArrary(List<Integer> predictedLabels) {
List<String[]> covertedLabels = new ArrayList<>();
for(Integer label: predictedLabels){
String[] line = new String[1];
line[0] = label.toString();
covertedLabels.add(line);
}
return covertedLabels;
}
public class FeatureSequence {
Double[] features;
Integer label;
public FeatureSequence(Double[] features, int label) {
this.features = features;
this.label = label;
}
@Override
public String toString() {
return "FeatureSequence{" +
"features=" + Arrays.toString(features) +
", label=" + label +
'}';
}
}
Here are the dependencies,
<dependency>
<groupId>org.pmml4s</groupId>
<artifactId>pmml4s_2.13</artifactId>
<version>0.9.13</version>
</dependency>
<dependency>
<groupId>com.opencsv</groupId>
<artifactId>opencsv</artifactId>
<version>5.5.2</version>
</dependency>
I created a new java project based on the dependencies of Maven, and tried the code above, the result is correct:
You need to clean and rebuild your project. BTW, which version of PMML4S was used before 0.9.13?
@fnc11 Please, let me know if you still have a problem
Sorry for late reply, got busy in some other work.
The issue is still there, I am attaching the whole project as zip file, maybe you can spot the issue. I cleaned and tried running again, still the score value was wrong. ActivityPredictionSVM.zip
Oh, it's caused by the exported PMML models, my model is different from yours, I attached it, you can try. svm_SD.pmml.txt
I use the sklearn2pmml:
pip show sklearn2pmml
Name: sklearn2pmml
Version: 0.77.0
Summary: Python library for converting Scikit-Learn pipelines to PMML
Home-page: https://github.com/jpmml/sklearn2pmml
Author: Villu Ruusmann
Author-email: villu.ruusmann@gmail.com
License: GNU Affero General Public License (AGPL) version 3.0
Location: /Users/scorebot/anaconda3/lib/python3.7/site-packages
Requires: scikit-learn, sklearn-pandas, joblib
Required-by:
You probably use an old version. please try to update it, then export a model again. My scikit-learn is 0.23.2
@fnc11 Did you get the new PMML model to resolve your issue?
Close it. if you have other problems, please feel free to open a new one.
I have trained one SVM model for activity recognition task [static, dynamic]. Original Device Data [acc_x, acc_y, acc_z, activity] Took 100 data points or 2 secs (device frequency = 50) data, i.e. took 100 acc_x, 100 acc_y,100 acc_z, 100 activity. Extracted mean and std from these sequences, so features list will be [mean_x, mean_y, mean_z, std_x, std_y, std_z] and label is mode(100 activities).
So below is whole procedure how to reproduce the issue.
The issue is when using the pmml model in Java it is giving different predictions than the model which was saved.
I am attaching train_seqs, train_labels and test_seqs, test_labels as CSV files. train_ft_seqs.csv test_ft_seqs.csv