coursera / courier

Data interchange for the modern web + mobile stack.
http://coursera.github.io/courier/
Apache License 2.0
98 stars 24 forks source link

Fix enum deadlock #82

Closed jeremycarroll closed 4 years ago

jeremycarroll commented 4 years ago

Fixing a Deadlock and Documentation

I ended up finding it difficult to move from my initial fix, to a testable deployable change, that I had run with the unit tests in Coursera. Hence this PR now covers two very different issues:

Actual Intended Change

This PR addresses the deadlock seen in a Coursera internal diff. This deadlock appears to be an instance of the lazy value initialization order problem described in this SIP

There are two new pieces of code:

  1. The diff adds a test case exhibiting the deadlock

  2. The laziness of the lazy field properties of ScalaEnumTemplate is re-implemented explicitly with no locking, so that multiple threads can initialize the value concurrently, with the last thread winning. This is semantically safe since all threads compute the same value. It resolves the deadlock because one of the locks no longer exists.

Prior to the implementation change, the raceWithNameWithName test usually (but not always) fails with the two critical threads having the following state - showing the deadlock between a class initialization lock for a companion object and the lock for the lazy property.

"B@1806" prio=5 tid=0x18 nid=NA runnable
  java.lang.Thread.State: RUNNABLE
     blocks A@1805
      at sun.misc.Unsafe.ensureClassInitialized(Unsafe.java:-1)
      at sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(UnsafeFieldAccessorFactory.java:43)
      at sun.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:156)
      at java.lang.reflect.Field.acquireFieldAccessor(Field.java:1088)
      at java.lang.reflect.Field.getFieldAccessor(Field.java:1069)
      at java.lang.reflect.Field.get(Field.java:393)
      at org.coursera.common.reflect.CompanionReflector$.findCompanionInstanceOfCompanionClass(CompanionReflector.scala:68)
      at org.coursera.common.collection.Enum$$anonfun$findSymbols$1.applyOrElse(Enum.scala:79)
      at org.coursera.common.collection.Enum$$anonfun$findSymbols$1.applyOrElse(Enum.scala:77)
      at scala.PartialFunction.$anonfun$runWith$1$adapted(PartialFunction.scala:145)
      at scala.PartialFunction$$Lambda$29.147022238.apply(Unknown Source:-1)
      at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
      at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
      at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
      at scala.collection.TraversableLike.collect(TraversableLike.scala:274)
      at scala.collection.TraversableLike.collect$(TraversableLike.scala:272)
      at scala.collection.mutable.ArrayOps$ofRef.collect(ArrayOps.scala:198)
      at org.coursera.common.collection.Enum.findSymbols(Enum.scala:77)
      at org.coursera.common.collection.Enum.findSymbols$(Enum.scala:63)
      at org.coursera.courier.templates.ScalaEnumTemplate.findSymbols(ScalaEnumTemplate.scala:28)
      at org.coursera.common.collection.Enum.symbols(Enum.scala:61)
      at org.coursera.common.collection.Enum.symbols$(Enum.scala:61)
      at org.coursera.courier.templates.ScalaEnumTemplate.symbols$lzycompute(ScalaEnumTemplate.scala:28)
      - locked <0x716> (a org.coursera.courier.templates.EnumTypeB$)
      at org.coursera.courier.templates.ScalaEnumTemplate.symbols(ScalaEnumTemplate.scala:28)
      at org.coursera.courier.templates.EnumTypeB$.withName(EnumTemplateRaceTest.scala:199)
      at org.coursera.courier.templates.TestClassB.withName(EnumTemplateRaceTest.scala:32)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-1)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:498)
      at org.coursera.courier.templates.EnumTemplateRaceTest$TestSetup.withName(EnumTemplateRaceTest.scala:95)
      at org.coursera.courier.templates.EnumTemplateRaceTest.$anonfun$raceWithNameWithName$2(EnumTemplateRaceTest.scala:143)
      at org.coursera.courier.templates.EnumTemplateRaceTest$$Lambda$35.1859383896.apply$mcV$sp(Unknown Source:-1)
      at org.coursera.courier.templates.EnumTemplateRaceTest$TestSetup$$anon$3.run(EnumTemplateRaceTest.scala:108)

"A@1805" prio=5 tid=0x17 nid=NA waiting for monitor entry
  java.lang.Thread.State: BLOCKED
     waiting for B@1806 to release lock on <0x716> (a org.coursera.courier.templates.EnumTypeB$)
      at org.coursera.courier.templates.ScalaEnumTemplate.properties$lzycompute(ScalaEnumTemplate.scala:44)
      at org.coursera.courier.templates.ScalaEnumTemplate.properties(ScalaEnumTemplate.scala:44)
      at org.coursera.courier.templates.ScalaEnumTemplate.properties(ScalaEnumTemplate.scala:51)
      at org.coursera.courier.templates.EnumTypeB$VALUE3$.<init>(EnumTemplateRaceTest.scala:188)
      at org.coursera.courier.templates.EnumTypeB$VALUE3$.<clinit>(EnumTemplateRaceTest.scala:-1)
      at org.coursera.courier.templates.TestClassB.value(EnumTemplateRaceTest.scala:31)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-1)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:498)
      at org.coursera.courier.templates.EnumTemplateRaceTest$TestSetup.value(EnumTemplateRaceTest.scala:94)
      at org.coursera.courier.templates.EnumTemplateRaceTest.$anonfun$raceWithNameWithName$1(EnumTemplateRaceTest.scala:143)
      at org.coursera.courier.templates.EnumTemplateRaceTest$$Lambda$34.640808588.apply$mcV$sp(Unknown Source:-1)
      at org.coursera.courier.templates.EnumTemplateRaceTest$TestSetup$$anon$2.run(EnumTemplateRaceTest.scala:102)

The test case, CompanionRaceTest, is a version of this problem with no reference to any Coursera code. It uses two lazy fields and a companion class. If setting the WORKAROUND field to false, then the test fails, while it is true the test case uses the same workaround as is implemented in this diff. This is hence a variant of the Scala lazy value problem.

Documentation and Build Changes

As I approach my real task, it became clear that the documentation was incomplete and messy.

I have made the following high-level changes to address these problems:

  1. I consolidated both the Coursera internal document and the old CONTRIBUTING.md into a new set of developer documents in a docs folder in this PR
  2. I fixed the travis tests, which were broken
  3. I moved the cross version part of the travis tests from .travis.yml to the definition of fulltest, to enable the simple documentation of This project also uses the convention of prefixing commands with full as a custom cross-version command
  4. I ensured the test suite from the courier-runtime project is actually run.
  5. I improved the cross-build, so as to simplify the publishing steps
  6. I made it clear that courier-maven-plugin and courier-gradle-plugin are not supported, removing the gradle support from some of the scripts (but clarifying and improving the documentation in the gradle subproject itself)
  7. I ensured that all the commands suggested in the developer documentation actually work
  8. I updated both the sonatype and bintray plugins to the final version before they ceased supporting sbt 0.13
  9. I added @amory-coursera and @dguo-coursera to the list of developers since the currently published list has no active maintainers, which feels like an error.
  10. I ensured that we can build and publish SNAPSHOT builds for integration testing in Coursera (both the sbt-plugin and the other subprojects; and even the gradle plugin)

Testing

Testing Actual Change

The new unit test and running the tests on the companion PR over Coursera's services: many times, none of which deadlocked.

Testing Other Changes

  1. Travis is building correctly
  2. I read through the new documentation, and deleted my copy of the project and started from scratch (not the credentials part)
mkowaliszyn-coursera commented 4 years ago

For posterity, this affects program service quite frequently and probably every other service: https://docs.google.com/document/d/1L35IIDNzFUlMn8kjbU4o9-dBJGgjmFlH3NKMABvxXKo/edit#

jeremycarroll commented 4 years ago

image image image image image image image image

jeremycarroll commented 4 years ago

image

jeremycarroll commented 4 years ago

image image

jeremycarroll commented 4 years ago

Note my comments with images are a way to get the images into the documentation

jeremycarroll commented 4 years ago

@amory-coursera one final question is the version number. I started by viewing this as a bug fix, taking the version to 2.1.8; we could say that the developer documentation is a new feature, taking it to 2.2.0; or we could even say that the (belated) honesty about gradle and maven not being supported is not backwardly compatible taking it to 3.0.0.

jeremycarroll commented 4 years ago

As per slack conversation,

Ask someone with write access to:

jeremycarroll commented 4 years ago

image