Closed Jiehong closed 7 months ago
Thanks for creating the report. Could you provide the exact size of the .feature
file in mega bytes?
I had a quick look at the code, but I don't think there is much that can be done. The current implementation creates a String
using a StringBuilder
which means that we have a few copies of the file in memory. This could be made more efficient by writing to an OutputStream
instead. But then Spotless, would have to turn that OutputStream
into a String
anyway to do their comparison.
If you do have a better idea to solve this, please feel free to make a suggestion.
File size was 180kB, so not even in mega bytes.
Doesn't feel that big to me. In the end, we've found a workaround by extracing the 170K json into its own json file, and introducing it as a variable:
* def myData = read ("file:src/test/java/xxx/big_file.json")
This way gherkin does not need to try to format a "big" file.
Otherwise, I'm not quite sure how to better handle it.
(we tried passing -Xmx 2048m to the jvm.config options for maven, but it didn't help for some reasons.)
Ouch. That doesn't seem big indeed. At this point I'd attach the JVM console and have a look at where the memory goes.
Personally, at present, I don't have the time to dig deeper though. If you or someone else does have the time available it would be most welcome.
Dumping on OOM leads to some information (-Xmx64M):
Thread 'mvn-builder-xxx' with ID = 29
java.lang.OutOfMemoryError.<init>(OutOfMemoryError.java:48)
jdk.internal.misc.Unsafe.allocateUninitializedArray(Unsafe.java:1380)
java.lang.StringConcatHelper.newArray(StringConcatHelper.java:511)
java.lang.StringLatin1.replace(StringLatin1.java:362)
java.lang.String.replace(String.java:3100)
io.cucumber.gherkin.utils.pretty.PrettyHandlers.handleDocString(PrettyHandlers.java:71)
io.cucumber.gherkin.utils.pretty.PrettyHandlers.handleDocString(PrettyHandlers.java:30)
io.cucumber.gherkin.utils.WalkGherkinDocument.walkStep(WalkGherkinDocument.java:105)
io.cucumber.gherkin.utils.WalkGherkinDocument.walkSteps(WalkGherkinDocument.java:96)
io.cucumber.gherkin.utils.WalkGherkinDocument.walkScenario(WalkGherkinDocument.java:134)
io.cucumber.gherkin.utils.WalkGherkinDocument.walkFeature(WalkGherkinDocument.java:65)
io.cucumber.gherkin.utils.WalkGherkinDocument.walkGherkinDocument(WalkGherkinDocument.java:40)
io.cucumber.gherkin.utils.pretty.Pretty.prettyPrint(Pretty.java:18)
com.diffplug.spotless.glue.gherkin.GherkinUtilsFormatterFunc.apply(GherkinUtilsFormatterFunc.java:58)
com.diffplug.spotless.FormatterFunc.apply(FormatterFunc.java:32)
com.diffplug.spotless.FormatterStepImpl$Standard.format(FormatterStepImpl.java:82)
com.diffplug.spotless.FormatterStep$Strict.format(FormatterStep.java:88)
com.diffplug.spotless.Formatter.compute(Formatter.java:246)
com.diffplug.spotless.PaddedCell.check(PaddedCell.java:126)
com.diffplug.spotless.PaddedCell.check(PaddedCell.java:98)
com.diffplug.spotless.PaddedCell.calculateDirtyState(PaddedCell.java:220)
com.diffplug.spotless.PaddedCell.calculateDirtyState(PaddedCell.java:190)
com.diffplug.spotless.maven.SpotlessCheckMojo.process(SpotlessCheckMojo.java:54)
com.diffplug.spotless.maven.AbstractSpotlessMojo.execute(AbstractSpotlessMojo.java:229)
org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:126)
org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2(MojoExecutor.java:328)
org.apache.maven.lifecycle.internal.MojoExecutor.doExecute(MojoExecutor.java:316)
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:212)
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:174)
org.apache.maven.lifecycle.internal.MojoExecutor.access$000(MojoExecutor.java:75)
org.apache.maven.lifecycle.internal.MojoExecutor$1.run(MojoExecutor.java:162)
org.apache.maven.plugin.DefaultMojosExecutionStrategy.execute(DefaultMojosExecutionStrategy.java:39)
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:159)
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:105)
org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:193)
org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:180)
java.util.concurrent.FutureTask.run(FutureTask.java:317)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
java.util.concurrent.FutureTask.run(FutureTask.java:317)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
java.lang.Thread.runWith(Thread.java:1596)
java.lang.Thread.run(Thread.java:1583)
This seems to make sense, as the big json in the gherkin file is defined as:
Feature:
Background:
# not much herer
Scenario:
* def myData =
"""
{
super long json here
over 9000 lines
}
"""
# rest of the scenario test afterwards, just for a few lines
With -Xmx128M, a different one occurs:
Thread 'mvn-builder-xxxx' with ID = 29
java.lang.OutOfMemoryError.<init>(OutOfMemoryError.java:48)
java.util.Arrays.copyOf(Arrays.java:3541)
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:242)
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:587)
java.lang.StringBuilder.append(StringBuilder.java:179)
io.cucumber.gherkin.utils.pretty.Result.append(Result.java:10)
io.cucumber.gherkin.utils.pretty.PrettyHandlers.handleDocString(PrettyHandlers.java:82)
io.cucumber.gherkin.utils.pretty.PrettyHandlers.handleDocString(PrettyHandlers.java:30)
io.cucumber.gherkin.utils.WalkGherkinDocument.walkStep(WalkGherkinDocument.java:105)
io.cucumber.gherkin.utils.WalkGherkinDocument.walkSteps(WalkGherkinDocument.java:96)
io.cucumber.gherkin.utils.WalkGherkinDocument.walkScenario(WalkGherkinDocument.java:134)
io.cucumber.gherkin.utils.WalkGherkinDocument.walkFeature(WalkGherkinDocument.java:65)
io.cucumber.gherkin.utils.WalkGherkinDocument.walkGherkinDocument(WalkGherkinDocument.java:40)
io.cucumber.gherkin.utils.pretty.Pretty.prettyPrint(Pretty.java:18)
com.diffplug.spotless.glue.gherkin.GherkinUtilsFormatterFunc.apply(GherkinUtilsFormatterFunc.java:58)
com.diffplug.spotless.FormatterFunc.apply(FormatterFunc.java:32)
com.diffplug.spotless.FormatterStepImpl$Standard.format(FormatterStepImpl.java:82)
com.diffplug.spotless.FormatterStep$Strict.format(FormatterStep.java:88)
com.diffplug.spotless.Formatter.compute(Formatter.java:246)
com.diffplug.spotless.PaddedCell.check(PaddedCell.java:126)
com.diffplug.spotless.PaddedCell.check(PaddedCell.java:98)
com.diffplug.spotless.PaddedCell.calculateDirtyState(PaddedCell.java:220)
com.diffplug.spotless.PaddedCell.calculateDirtyState(PaddedCell.java:190)
com.diffplug.spotless.maven.SpotlessCheckMojo.process(SpotlessCheckMojo.java:54)
com.diffplug.spotless.maven.AbstractSpotlessMojo.execute(AbstractSpotlessMojo.java:229)
org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:126)
org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2(MojoExecutor.java:328)
org.apache.maven.lifecycle.internal.MojoExecutor.doExecute(MojoExecutor.java:316)
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:212)
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:174)
org.apache.maven.lifecycle.internal.MojoExecutor.access$000(MojoExecutor.java:75)
org.apache.maven.lifecycle.internal.MojoExecutor$1.run(MojoExecutor.java:162)
org.apache.maven.plugin.DefaultMojosExecutionStrategy.execute(DefaultMojosExecutionStrategy.java:39)
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:159)
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:105)
org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:193)
org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:180)
java.util.concurrent.FutureTask.run(FutureTask.java:317)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
java.util.concurrent.FutureTask.run(FutureTask.java:317)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
java.lang.Thread.runWith(Thread.java:1596)
java.lang.Thread.run(Thread.java:1583)
Can't upload a screenshot, but the analysis of the biggest objects are:
1st case: 18MB int[]
from StringLAtin1:382 whose length is 4.5 millions
2nd case: 27MB String
from PrettyHandlers.java:82 (whose value is a {\"\"\"\"\"\"\"\"... tons and tons of escaped "
Looks like there might be some duplicated escaped double strings growing very big, and creating huge objects in memory (might not be helped with String's immutability in the first case).
hoping this helps
Interesting, it looks like this may not be correct:
I would expect the replaced string to be equal to the delimiter in both cases.
If you use a smaller json in the doc string, does it even format correctly?
That's an interesting question!
Just gave it a try, and got weird results:
This allowed me to create a very simple case where the content of the "docstring" fully disable the formatting for that file (or crashes it if too big) if the docstring contains some json.
Here is a way to reproduce:
Feature: my feature
Background:
* url superUrl
# Testing a "docstring"
* configure thing =
"""
{
"key": "value"
}
"""
Expectations: file reformatted (some empty lines to be removed)
Reality: file considered already formatted.
If you try with this instead:
Feature: my feature
Background:
* url superUrl
# Testing a "docstring"
* configure thing =
"""
I'm something else
"""
Expectations and reality match: file gets reformatted.
Cheers. The formatting stuff is relatively easy to fix.
It may also fix the memory issue because the pretty formatter won't be replacing every "
with \"\"\"
. Though that would increase the size by a factor of 6 at most. You could try building #58 from source and add it as a <dependency>
to the spotless plugin.
"""
are also correctly formatted.Should be released soon. You'll have to ping Spotless for dependency updates.
👓 What did you see?
via using spotless:check, if a gherkin file becomes too big, the formatter causes the mvn's JVMÂ to run out of heapsize, and crashes with the following exception:
✅ What did you expect to see?
No crash.
📦 Which tool/library version are you using?
Gherkin 8.0.5, with spotless 2.37.0. Crash does not seem to depend on the spotless version.
🔬 How could we reproduce it?
Using the following maven plugin:
Define a bug xxx.feature file in your
src
directory, and runmvn spotless:check
.The feature file should be big enough (like 10k lines. In our case, it's because a case is using a big json input to test a service with).