Closed lolgab closed 1 year ago
There might be something useful that can be ported over from https://github.com/rallyhealth/weePickle/pull/94.
import upickle.default.*
enum Color derives Writer:
case Red, Green, Blue
println(write(Color.Blue))
println(write(Color.Red))
println(write(Color.Green))
Prints {"$type":"Color.Red"}
three times.
The problem is most likely in the macros.summonList. It should be something like:
inline final def summonWriters[T <: Tuple]: List[Writer[_]] =
inline erasedValue[T] match
case _: EmptyTuple => Nil
case _: (t *: ts) => summonWriter[t] :: summonWriters[ts]
inline final def summonWriter[A]: Writer[A] =
summonFrom {
case w: Writer[A] => w
case _: Mirror.Of[A] => Writer.derived[A]
}
It's my first time in this code so apologies if I have it wrong, but it seems like the root of the problem is that enumeration values are anonymous types, and do not get their own type at runtime - hence, the ClassTag[T]
that gets passed through to CaseCaseWriter
is actually the class of the enumeration, not the individual case. This results in the annotate
call always resolving to the first writer it finds, as all of the specialized writers for the enumeration values are considered (correctly) subtypes of this class tag.
If it's fine to have the default behavior for enumerations be to read/write out the string value, I can take a shot at porting the work @htmldoug referenced above? Which is to generate alternate readers/writers for enumerations using the generated valueOf
method.
Although a ClassTag is required for the macroW
it is not used for the Mirror.SumOf case. So that is unlikely to be the problem.
I spent a number of hours debugging this yesterday, and am fairly certain that it is the issue. This is not a problem deriving writers (writers are derived correctly for each enumeration value), it is specifically a problem writing the annotation. The ClassTag is used specifically when making the call to annotate
to filter the set of available leaf writers
CLASS NAME: upickle.EnumTests.SimpleEnum.B Classtag: upickle.EnumTests$SimpleEnum
This is from a println inside the derived writer. As you mention, the Classtag is not used there, so it is not a problem there. However, the call to annotate
ultimately calls into TaggedWriter.Node
/TaggedWriter.Leaf
with that information
object TaggedWriter{
class Leaf[T](c: ClassTag[_], tag: String, r: CaseW[T]) extends TaggedWriter[T]{
def findWriter(v: Any) = {
if (c.runtimeClass.isInstance(v)) tag -> r
else null
}
}
class Node[T](rs: TaggedWriter[_ <: T]*) extends TaggedWriter[T]{
def findWriter(v: Any) = scanChildren(rs)(_.findWriter(v)).asInstanceOf[(String, CaseW[T])]
}
}
We enter via the findWriter
call in Node
. The children are a list of Leaf
for each children. The _.findWriter
call then returns with whatever the first value in the list is, because all of the enumeration values are subtypes of the incorrect classtag.
I am not sure exactly why this is the case, but as mentioned previously, it appears that enumeration values are not considered types and cannot receive a classtag proper - I can invoke implicitly[ClassTag[MyEnum]]
, but not implicitly[ClassTag[MyEnum.Value1]]
. Therefore this method of using subtype filtering based on the ClassTag stored in the derived Writer seems to not be possible for enumerations.
From my experience implementing this PR ClassTags aren't needed, however uPickle is using them for something, so I'm clearly missing something.
I would expect that for simple enums (where the cases are simple names) the Mirror.ProductOf would never be invoked, and annotate
is only called there. The Mirror.SumOf only calls the macros.summonList
and the Writer.merge
. The summonList is invoked with macros.summonList[Tuple.Map[m.MirroredElemTypes, Writer]]
which seems a little magical. The Writer.merge
invokes TaggedWriter.Node
which does not involve ClassTags, the findWriter
there is being invoked on the writers returned by summonList.
Yeah, I don't 100% understand the intent of the tagging code - I was not willing to refactor the way that the annotate
calls worked as my first attempt to change the code :sweat_smile: especially given that I haven't worked on crossbuild setups before and it's taxing; I'm finding I have to keep track of many more of the relations in my head rather than via Intellij's type inference. I also had a nagging sense that the ClassTag
shouldn't be required in an ideal solution, as the information required for the annotation is technically already present in the writer. Perhaps @lihaoyi could comment on the intent of the TaggedWriter setup?
Have you looked at the PR I posted? Handling enums as being identified by their string value is the way I've seen them used in all codebases I've worked with in practice, usually with custom serializers. Do you think that is a reasonable default?
From what I can tell, TaggedWriter
uses a class tag for each TaggedWriter.Leaf
node to determine whether it's the correct writer to handle the class or if someone else should do it. The annotate
calls construct the leafs, and the merge
calls construct the non-leaf nodes. When something comes in to be serialized, it basically walks the tree of TaggedWriter.
Node/
TaggedWriter.Leafinstances looking for a writer to handle a particular class, and uses that writer to write it.
TaggedReaderhas a similar setup, but uses a string
tag` comparison rather than using a class comparison, since that's all we have during deserialization.
The relevant code is here https://github.com/com-lihaoyi/upickle/blob/b8ab7ab21b76fb085497ab5b84f424647bf3e2dd/core/src/upickle/core/Types.scala#L336
If you can get away without using a ClassTag
, e.g. by making TaggedWriter
use a string tag the same way TaggedReader
does, then great.
I can't answer questions about the Scala 3 stuff, as I'm not really familiar with Scala 3 macros or language features. Maybe @anatoliykmetyuk or @jodersky would be able to help more. But in general, I trust our test suite, so if you can get your PR green with addition tests to cover the fixed logic, that's probably about all the assurance I need
Ok thanks @lihaoyi. In terms of general strategy for enumerations - would you prefer that they be serialized as the string representation by default when using macroRW
? Or serialized the way that any other case class would be.
To use the example above:
import upickle.default.*
enum Color derives Writer:
case Red, Green, Blue
println(write(Color.Blue))
println(write(Color.Red))
println(write(Color.Green))
Option 1
"Blue"
"Red"
"Green"
Option 2
{"$type":"Color.Blue"}
{"$type":"Color.Red"}
{"$type":"Color.Green"}
I can see an argument for either; option 1 is how I always see enumerations being dealt with in practice, whereas option 2 allows for extra properties to be included in the serialization by default, which could be handy, but feels less like "canonical" json in my opinion.
Also with option 2, if enumeration values change between versions of 2 pieces of software, I'm not sure how deserializing it should work - a situation like {"$type":"Color.Red", "ordinal": 0}
deserializing to an enumeration value that is Color.Red(ordinal = 1)
seems strange. But we wouldn't be able to set those values for enumeration instances, so you'd have to "lie" during deserialization I think and locate the correct instance using valueOf
anyways?
Separate question, are there contributor guidelines that cover how to add to the documentation for the site? The enumeration handling should be added as part of the PR
@BoopBoopBeepBoop For now let's go with Option 2.
If we go with Option 1 for enums, we should do the same for case object
s, but that would be a significant breaking change. That's not to say we shouldn't do it, but we shouldn't do it in this PR. Since Enums are basically case-class/case-object hierarchies, we should just follow suite for now.
Maybe we can add another flag or annotation to allow the "short" enum serialization format, but let's do that in a follow up if desired
Ah, I hadn't considered case objects. Or for something even worse, a sealed trait whose subtypes include enumerations mixed in. Ew. I'll see if I can find a nice way to distinguish between deriving the default ReadWriter and an enum-specific one.
I think case objects are broken for scala 3 right now as well, btw.
sealed trait Animal derives ReadWriter
case class Person(name: String, address: String, age: Int = 20)
extends Animal
derives ReadWriter
case class Cat(name: String, owner: Person)
extends Animal
derives ReadWriter
case class Dog(name: String, age: Int)
derives ReadWriter
case object Cthulu
extends Animal
derives ReadWriter
From the scala 3 tests - the addition of Cthulu
causes macro expansion failures. I'll put up a separate issue/PR for that.
Looks like ADTs encoded using enums also fail.
Using the Color
example in https://docs.scala-lang.org/scala3/reference/enums/adts.html
import upickle.default._
enum Color(val rgb: Int):
case Red extends Color(0xFF0000)
case Green extends Color(0x00FF00)
case Blue extends Color(0x0000FF)
case Mix(mix: Int) extends Color(mix)
object Color {
implicit val rw: ReadWriter[Color] = macroRW
}
Fails with
Exception occurred while executing macro expansion.
java.lang.Exception: Enumeration valueOf method not found
at upickle.implicits.macros.macros$package$.$anonfun$8(macros.scala:104)
at scala.Option.getOrElse(Option.scala:201)
at upickle.implicits.macros.macros$package$.enumValueOfImpl(macros.scala:105)
Trying to write Scala 3
enum
values actually writes always the first one.Test reproducing the problem
Fails with: