Expose boxed inline value classes in JVM

Kotlin / KEEP

Kotlin Evolution and Enhancement Process

Apache License 2.0

3.41k stars 362 forks source link

Expose boxed inline value classes in JVM #394

Open serras opened 1 month ago

serras commented 1 month ago

This is an issue to discuss exposing boxed inline value classes in JVM. The current full text of the proposal can be found here.

We propose modifications to how value classes are exposed in JVM, with the goal of easier consumption from Java. This includes exposing the constructor publicly and giving the ability to expose boxed variants of operations.

daniel-rusu commented 1 month ago

Exposing Kotlin inline classes to Java prior to the release of Valhalla value classes is dangerous as it will expose a new category of defects regarding their identity. While Kotlin marks these usages as errors preventing compilation, the Java world only does the same for Valhalla value classes as they also don't have object identity.

If inline classes are exposed before Valhalla value classes then the following new categories of defects will be introduced (assume PositiveInt is a Kotlin inline class):

Equality Checks

// Kotlin
class Person(val name: String, val id: PositiveInt)

// Java
if (person1.getId() === person2.getId()) // fails when person1 === person2!

The above fails to compile in Kotlin but this would be allowed in Java (unless it's exposed as a Valhalla value class). Java would auto-box each access so that the reference equality fails even though they come from the same value. Even worse is that if the reference to a Kotlin inline class instance is passed around in the Java world then reference equality checks will pass but will start to fail when crossing the Kotlin-Java boundary such as when passing the reference to a Kotlin function that stores it in a local unboxed variable before passing it to some other Java code that auto-boxes it again. While reference equality checks aren't too common in Kotlin, they are quite common in Java code.

Identity hashCode

// Kotlin
class Person(val name: String, val id: PositiveInt)

// Java
val bob = Person("Bob", 123)

// Use the identity hashCode of an inline class reference for some operation
identityHashCode(bob.getId())

// Use the identity hashCode for some other operation
identityHashCode(bob.getId()) // different identity hashCode even though it's coming from the same `id` "instance".

Synchronization Similarly to identity hashCode, we can end up with multiple boxed instances in the Java world that all come from the same inline class value so we incorrectly think we're synchronizing on the same object.

// Kotlin
class Person(val name: String, val bankId: PositiveInt, val governmentId: PositiveInt)

// Java
val bob = Person("Bob", 123, 9999)

// synchronize on the bankId for some banking operation
synchronize(bob.getBankId()) { ... }

// perform some other banking operation
synchronize(bob.getBankId()) // different identity hashCode even though it's coming from the same `id` "instance".

For this hypothetical scenario, we want to allow concurrent access when performing unrelated operations so we only synchronize on the bankId when performing banking operations etc.

Conclusion There are additional categories of defects from other operations that rely on the object identity such as wait / notify, weak references, collections that rely on identity like IdentityHashMap, etc. Java programmers won't always be aware which classes are Kotlin Inline value classes. Even if they did, they likely won't be aware of these new gotchas. If we wait for Valhalla value classes to be released first, Kotlin inline classes could be exposed as value classes in Java preventing Java developers from attempting to use the object identity of these instances.

JakeWharton commented 1 month ago

I didn't have time to dig in deeply today, but I'll just echo that the timing of this in relation to the forthcoming changes around Valhalla as presented at JVMLS recently is risky. Surely little harm would come from waiting to see what they ship first?

serras commented 1 month ago

Thanks for the fast feedback! We're indeed monitoring the upcoming changes in Valhalla, but there are a few comments.

The goal of this KEEP is to expose a Kotlin API better to the JVM world, something in which we were lacking. Since right now Kotlin inlines the uses of value class, I think there is a need for this KEEP in ensuring that a "non-inlined" version is exposed.

Once value classes land (and the corresponding JVM version is somehow common, which could take years), this problem does not go away completely, if the inlining mechanism is still in place. One possible future in that case is:

value class without @JvmInline is compiled into a JVM value class;
value class with @JvmInline (which is mandatory now) are inlined, as currently.

The good news is that we can turn the "boxed variants" of the inlined classes also into a value class in a backward compatible way (or so they hinted in JVMLS). In that case would avoid the problems mentioned by @daniel-rusu, which are unfortunately out of reach with the current JVM. I think it's worth adding this note to the KEEP, in fact.

In summary: this KEEP moves the needle a bit into improved compatibility with Java. The best solution would indeed involve value classes, but there is a risk on waiting for that JVM to be common; so I think a good approximation is to expose the boxed variant with the same caveats that exist now, but mark then as value class for newer versions.

daniel-rusu commented 1 month ago

I think it's important to mention that the improved performance of inline classes is the only reason they are used since regular classes or data classes provide a better experience in pretty much every way when performance isn't the primary concern. As users, we need to be extra careful with inline classes to avoid auto-boxing otherwise the performance can be worse than regular classes negating the only purpose for their use. Inline classes provide fewer capabilities than regular classes, they add surprises such as JVM conflicts (eg. constructor that accepts inline class vs constructor that accepts underlying value etc.), and introduce restrictions such as avoiding operations that rely on their missing identity. So when inline classes are chosen, companies are choosing to pay the price of increased complexity with the only benefit of gaining performance.

The need to wait for Valhalla value classes to become common seems to be a misconception because that refers to a different target audience. The type of places that use bleeding-edge features like inline classes are the type of places that upgrade to the latest JVM release. After all, inline classes are strictly a performance choice, and upgrading the JVM improves performance aligning with the target audience of inline classes since performance is so important for this audience. The decision should be based on when Valhalla value classes are released on the JVM instead of when that version of the JVM becomes common.

From a design perspective, it doesn't make sense to choose inline classes if we want to expose these to the Java world without project Valhalla. That's because using these values from Java would auto-box them and negate any performance benefit of inline classes likely resulting in worse performance than regular classes (as that would have just passed a reference) and thus nullifying our reason for choosing inline classes in the first place. However, if these were exposed as Valhalla value classes, they could take advantage of Valhalla JVM optimizations so they should continue to have a performance benefit and reason to exist.

Regarding backward compatibility, unlike inline classes, it's my understanding that the Java team plans to enumerate the classes that are intended to be converted to Valhalla value classes (such as the primitive wrapper classes) and add warnings whenever they are used in a future incompatible way like performing operations that rely on their identity. The behavior of some of these operations will change. For example, == will switch from checking referential equality to checking value equality in Java when operating on Valhalla value classes, changing the behavior of existing Java code that performs this operation on these classes. Additionally, some operations will no longer compile when performed on Valhalla value classes. The warnings will steer developers away from performing these operations with value-class candidates. So the Valhalla changes don't appear to be backward compatible and Kotlin will be in a worse state. Luckily for Java, the value class candidates are rarely used in a future-unsafe way because it's rare to compare reference equality of primitive wrapper classes or synchronize on them etc.. A common use-case of inline classes is for storing validated content such as EmailAddress which wraps a String. Given existing patterns of how inline classes are used in Kotlin, using inline classes from Java in a future-unsafe manner will be much more likely as developers won't think of these as they do with primitive wrapper classes but instead treat them as normal classes and rely on their identity. Java developers don't have the concept of inline classes so they likely won't think about these usages in a special way resulting in a poor Kotlin-Java experience.

Regarding compatibility with Java, exposing inline classes to Java without Project Valhalla would provide a worse compatibility experience as it would introduce the large swath of defect categories that I talked about in my first comment. When we choose inline classes, we know that they will meet our performance reasons for choosing them and are fairly safe since the Kotlin compiler helps reduce dangerous usage patterns but these implicit requirements will no longer be valid if they are exposed in Java without the safety and performance of Project Valhalla.

daniel-rusu commented 1 month ago

Just a quick note that the fun PositiveInt.duplicate() example in the KEEP appears to fall for the identity traps outlined above. The only reason to duplicate an immutable object is if its identity matters. However, the identity of inline/value classes should never be used or relied on as that will introduce many defects.

On a similar note regarding the first 2 design goals:

The following are some scenarios that we would like to support:

Creating boxed values: for example, creating an actual instance of PositiveInt.

Calling operations that take or return value classes using their boxed representations.

Hopefully these new abilities won't be accessible from Kotlin code. While they are needed by Java to maintain its real type, any usages from Kotlin would only encourage dangerous identity-related defects and also introduce performance issues since the boxed variants will be automatically unboxed by the Kotlin compiler at the first opportunity such as when passing the boxed value to a function that operates on the inline class type etc.

serras commented 1 month ago

Let me reiterate why I think that, regardless of the upcoming value class support in the JVM, exposing a "non-inlined" version of Kotlin's @JvmInline value class is useful.

People want to have a way to use the best representation they can get in Kotlin, while keeping "good enough" compatibility with other languages in the JVM, especially Java. This can be seen by the many comments in the corresponding YouTrack ticket and the ones related to it.

Right now, if you want to expose your API with a wrapper type and it being accessible in Java, you can use a data class (this always brings risks with forwards compatibility, but in this case this is no worse than a value class).

data class StudentId(val id: Long)
class Student(val id: StudentId, val name: String, ...)

This KEEP amounts to proposing that if you decide to use a value class for StudentId, your API for Java is equivalent to the one above, whereas in Kotlin you get the benefits of inlining. For me that is a net gain.

As I mentioned above, it is possible for us to generate a "real" JVM value class, which shall keep the identity-based behavior on JVMs which do not support the feature. The design document specifies that value classes are distinguished by setting the ACC_VALUE flag, and the JVM spec states that:

_All bits of the accessflags item not assigned in Table 4.1 are reserved for future use. They should be set to zero in generated class files and should be ignored by Java Virtual Machine implementations.

serras commented 1 month ago

On this discussion, I would like to distinguish two different positions which I feel are getting mixed up:

Kotlin should stop doing inline as soon as possible, and switch to JVM value classes,
Kotlin should not have the ability to expose the boxed variant of an inline value class.

On that note, this KEEP gives you the option to expose boxed variants (well, in all truth, exposing the constructor is not optional in the current design, but any operations are). If your target with a library, or your application, uses only Kotlin, you may not want to do this. But again, there are library authors which want this boxed variant to the available, knowing the risks that it entails with respect to identity.

JakeWharton commented 1 month ago

I was more coming from the angle of that boxing of a value class is an already problematic concern in just Kotlin (as mentioned above), that this is only going to increase that problem since they're generally thought of as a performance optimization but can actually be the opposite.

But if we're just going to say that @JvmInline and its boxing is mostly a short-term hack that we don't care about that much, and at some point a non-@JvmInline value class will become a Valhalla value type available to both languages, then this KEEP doesn't really change anything.

serras commented 1 month ago

Just to clarify: this KEEP changes nothing about the view Kotlin has of an inline value class. All of the “boxed variants” of callables are only available to Java, the Kotlin compiler hides them in the same way that it hides the boxed class it currently creates.

Update: clarification added in the KEEP text.

qwwdfsad commented 4 days ago

My biggest design concern for the proposal is that it seems to be a wrong default behaviour -- in the ideal world, we would like to have value classes fully interoperable by default. The current behaviour is rather a legacy (and leaking implementation detail at this point) of the initial translation scheme and/or workaround for risks and assumptions we used to have.

We have a really nice indicator of that: after the release, we would immediately add this annotation to all the value classes in the standard library and kotlinx libraries. And probably we'll add it to every new value class to come.

My question is the following: why would an API author want to make its API inaccessible to Java? What would be the semantics of such an intention and what is the use case?

Apart from that, the current proposal gives a burden of knowledge for developers:

One has to know and remember to apply this annotation
They have to apply it not only to a type but to all the extensions as well
The cost of mistake here is very delayed and the feedback loop is not immediate
It has a non-trivial gotcha: if authors of, for example, Duration marked it and all its extensions with JvmExposedBoxed, everything works for you as a Java consumer. But if you decided to roll your own extension, it silently is non-interoperable by default, breaking the implicit expectation that it will be

serras commented 4 days ago

My question is the following: why would an API author want to make its API inaccessible to Java? What would be the semantics of such an intention and what is the use case?

Given some of the potential problems in this comment, the developer maybe doesn't want to risk their API being used incorrectly?

daniel-rusu commented 4 days ago

My question is the following: why would an API author want to make its API inaccessible to Java? What would be the semantics of such an intention and what is the use case?

Given some of the potential problems in this comment, the developer maybe doesn't want to risk their API being used incorrectly?

Exactly, I'm developing a library that would completely break if the inline classes were automatically exposed to Java without explicit opt-in. Keeping the existing only-available-in-Kotlin behavior is really important for library authors unless we explicitly enable it in the library otherwise inline classes would no longer be viable for many scenarios.

As a quick summary of my library in case it helps to get a better sense of my use-case, I'm creating Immutable Arrays which wraps regular arrays with inline classes that compile down to regular arrays in the bytecode but without exposing mutating capabilities and introducing hundreds of optimized specializations. The library targets memory-constrained devices or performance-oriented workloads providing a replacement for read-only lists reducing memory consumption by up to 32 times and providing significant performance improvements based on JMH benchmarks, making this a great use-case for inline classes.

Problems caused by auto-exposing inline classes without explicit opt-in:

My library would no-longer be a reasonable replacement for lists when used from Java. While Kotlin detects and prevents many dangerous scenarios described here, such as attempting to synchronize on the inline "instance", the Java compiler would allow those invalid actions so using my library from Java would introduce all sorts of defects described above.
I'm relying on inline-class name-mangling so that Java doesn't provide back-door access to the backing field and bypass immutability. Name mangling is important as the inline-class field needs to have internal visibility because I need to use the @PublishedApi annotation so I can reference the field from the many small inline functions that accept lambdas. While internal visibility is safe in Kotlin for my library since these inline classes are defined in their own module, automatically exposing these would allow Java code to access and mutate the backing array. This would completely break the core immutability guarantee of my library because the internal visibility modifier would compile to public in Java.
The inline methods that are defined in my inline classes would no longer meet the performance expectations that the library promises because they won't be inlined into the Java call-sites so all my JMH benchmark results would be misleading.

qwwdfsad commented 2 days ago

Thanks for the elaborative answer!

That's a nice example of the intent of actually hiding the API from Java. The general question, though, still stands -- is it the intent that we expect people to have all the time? If so, then the proposal is more or less aligned with how value classes are used. If not, then we effectively disincentivize the majority of use cases for the sake of the overall minor inconvenience for the ecosystem.

Keeping the existing only-available-in-Kotlin behaviour is really important for library authors

The majority of Kotlin's audience is not library authors, though. But it might be an idea worth exploring -- i.e. provide @JvmExposed(true|false) and force library authors to specify it if explicitApi mode is enabled. It will address the issue with user-defined extensions and/or application-level code while solving API authors problems. WDYT?

Given some of the potential problems https://github.com/Kotlin/KEEP/issues/394#issuecomment-2347319851, the developer maybe doesn't want to risk their API being used incorrectly?

I would consider these debatable in general, i.e., they are applicable to any @jdk.internal.ValueBased class. Using not-really-conventional identity-based operators on things that are values and stating that both in their documentation and declaration (i.e., with the declaration itself, value class) should not prevent the overall advancement of the ecosystem. Additionally, we can tool that with IDE/static analysis tools and provide as many warnings as we can to sweeten if that's a recurring issue for users

daniel-rusu commented 2 days ago

That's a nice example of the intent of actually hiding the API from Java. The general question, though, still stands -- is it the intent that we expect people to have all the time?

Using internal visibility is very common in libraries and auto-exposing these to Java will break encapsulation. Sometimes internal is chosen on purpose to allow other portions of the library to access internals that won't be exposed to the users of the library and many times internal is the only option if we want to access the field from inline functions (along with @PublishedApi).

Keeping the existing only-available-in-Kotlin behaviour is really important for library authors

The majority of Kotlin's audience is not library authors, though.

Yeah, most developers aren't creating libraries but the majority of Kotlin developers rely on external libraries and it's quite important for those to continue to work as intended.

But it might be an idea worth exploring -- i.e. provide @JvmExposed(true|false) and force library authors to specify it if explicitApi mode is enabled. It will address the issue with user-defined extensions and/or application-level code while solving API authors problems. WDYT?

I think that existing libraries should continue to compile if this feature is released so forcing it to be specified might not be an option based on existing Kotlin release practices.

Taking a step back, the vast majority of mixed Kotlin/Java codebases are transitioning to Kotlin rather than planning to permanently stay in a mixed codebase. Developers that add inline value classes are doing that only for performance reasons so they would lean towards converting the affected Java classes to Kotlin in order to benefit instead of going the other way to expose the inline classes to Java and make performance worse while also introducing the categories of defects described above.

The main value that exposing these to Java would add seems to be to make Java frameworks interop better with inline value classes instead of wanting to allow developers to manually use inline classes from Java. The other 2 design goals of exposing boxed variants seem to be more of an enabler for this main issue. Frameworks often rely on reflection and the KEEP mentions that there will continue to be many problems with reflection after implementing this proposal.

There were also library developers asking for the ability to expose inline classes to Java but that was in the early days when the implementation details and impacts weren't well understood so it's unclear whether most library developers would continue to want this given the consequences as choosing inline classes and exposing them to Java would be counter-productive if it makes performance worse than regular classes. Most of the mixed Kotlin/Java codebases from 5 years ago (when these discussions were taking place) are in a significantly better position with most or all of their Java classes already converted to Kotlin by now.

serras commented 1 day ago

Following a suggestion from @qwwdfsad, I've added a new section that mandates choosing whether you want to expose operations or not when using explicit API mode. That way library authors are forced to choose whether they want to expose operations or not. Let me know what you think.

daniel-rusu commented 1 day ago

@serras , that new section says that if we don't want to expose it in Java then we must add that annotation and also make the constructor private but I'm relying on internal constructor visibility. I think that libraries that use internal visibility for the field will typically also need that for the constructor when working with inline value classes. Can we make it so that just the exposedToJava=false flag is sufficient?

I'm surprised that mandating the annotation is a possibility because users that use source dependencies to these libraries (to compile them from source) would no longer be able to compile after upgrading their project to the new release so they would be forced to fix the libraries or wait for new versions before upgrading Kotlin. I'm not sure if a similar problem would also happen for KMP when dealing with targets that don't have pre-compiled binaries.

Edit: Libraries also typically try to target the oldest reasonable version of Kotlin in order for the libraries to be available to the largest audience. So libraries that target Kotlin 1.9.25 won't have access to this new annotation. If it's mandatory then we might need a different way to opt out such as some configuration string in the build script.

Overall it seems that explicit opt in and defaulting to current behavior without it is the safer approach instead of mandating the annotation.

serras commented 1 day ago

Re-reading this thread I think there was a bit of misunderstanding about how @JvmExposeBoxed is supposed to work. I've tried to clarify it in a new section, that I would recommend to read to interested parties.

In particular, I think it was not clear that whether you expose the constructor to Java or not is completely independent of whether you expose each of the operations. The reason for this is that exposing to Java uses box-impl and unbox-impl, not the actual constructors. This has the (maybe surprising) consequence that even if you don't write @JvmExposeBoxed in your class, people may still use @JvmExposeBoxed to expose functions using that value class.

One small point: @daniel-rusu, you were right, the KEEP was not correct with respect to constructors. I hope that the new version clarifies this point; you can choose whether to expose the constructor or not by annotating it.

serras commented 1 day ago

About the explicit API mode: this depends on the compiler version and may also depend on the language version. This annotation would only be mandated if you are compiling from the version we implement this on, and you could always set the compiler version to a smaller one.

Furthermore, whether this annotation is set or not should not affect users of that library at all, except for the source configuration you mention. However, I would argue this is a very small percentage (just a data point, but I didn't even know you could do that before reading your comment).