feedzai / feedzai-openml-java

Implementations for Feedzai's OpenML APIs to allow for usage of machine learning models in the Java programming language.
https://www.feedzai.com
Apache License 2.0
2 stars 11 forks source link

H2O Training has a data-race on super-csv #3

Closed nmldiegues closed 5 years ago

nmldiegues commented 5 years ago

The following problem is happening sometimes when we train concurrently 2 or more H2O models in the same JVM:

java.lang.ArrayIndexOutOfBoundsException: -6 at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:623) at java.lang.StringBuilder.append(StringBuilder.java:202) at org.supercsv.encoder.DefaultCsvEncoder.encode(DefaultCsvEncoder.java:81) at org.supercsv.io.AbstractCsvWriter.escapeString(AbstractCsvWriter.java:102) at org.supercsv.io.AbstractCsvWriter.writeRow(AbstractCsvWriter.java:196) at org.supercsv.io.AbstractCsvWriter.writeRow(AbstractCsvWriter.java:146) at org.supercsv.io.CsvListWriter.write(CsvListWriter.java:71) at com.feedzai.openml.h2o.H2OUtils.lambda$writeDatasetToDisk$2(H2OUtils.java:90) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.Iterator.forEachRemaining(Iterator.java:116) at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at com.feedzai.openml.h2o.H2OUtils.writeDatasetToDisk(H2OUtils.java:88) at com.feedzai.openml.h2o.H2OModelCreator.fit(H2OModelCreator.java:174) at com.feedzai.openml.h2o.H2OModelCreator.fit(H2OModelCreator.java:64)

The problem was diagnosed to be in the H2OUtils class where we are using:

new CsvPreference.Builder(CsvPreference.STANDARD_PREFERENCE).build()

While that seems to do a copy of that STANDARD_PREFERENCE static object, it actually reuses internally a StringBuilder for one of the objects, and for some reason super-csv mutates that later during CSV writes.

nmldiegues commented 5 years ago

Committed above