coursera / courier

Data interchange for the modern web + mobile stack.
http://coursera.github.io/courier/
Apache License 2.0
98 stars 24 forks source link

Importing a record without a namespace produces scala compile errors #21

Closed eboto closed 8 years ago

eboto commented 8 years ago

WORK IN PROGRESS DO NOT MERGE

This is an initial commit that demonstrates a bug. I'll push the fix shortly.

If any record imports a non-namespaced record, the generated scala is broken. It probably requires a _root_. import.

I came across this error while trying to make a courier format for storage in ElasticSearch. ElasticSearch as of v2.0 does not support dot-notation in field names, which makes it impossible to store namespaced unions unless they are flatTyped, which isn't supported across all generators (most notably typescript-lite).

My fairly gross workaround was to put the unioned records in the global namespace (which will stop them from having dots), but then I came across this issue!

An example of the output:

[error] /Users/eboto/code/courier/scala/generator-test/target/scala-2.10/courier/org/coursera/records/WithNoNamespaceField.scala:54: not found: type WithoutNamespace
[error]   lazy val blah: WithoutNamespace = obtainWrapped(WithNoNamespaceField.Fields.blah, classOf[WithoutNamespace])
eboto commented 8 years ago

Dug into the behavior of root-level types a bit. I had forgotten that neither scala nor java can import types defined at the root level (outside of a package) from the context of other packages.

So the only solution is to generate these no-namespace types in some default package, rather than in the root namespace. Like: org.coursera.courier.scalagen.rootnamespace.

AFAIK this will only apply to java or scala.

@saeta do you have any opinions about this? Seems like this could bite anyone trying to model some existing API as a courier data schema.

Alternatively is there some why to alter a union lookup key using annotations or some such?

saeta commented 8 years ago

@eboto excellent sleuthing. We've had some discussion internally previously about whether to serialize the tagged unions with the fully-qualified name or not. We went with requiring fully-qualified names to avoid namespace collisions, assuming that every JSON system would support dots within field names. :-(

There are a number of ways we can move forward. One alternative is to specify a different pegasus serializer for when you store data in ElasticSearch, which could drop the dots. Another alternative might be to specify an annotation on the tagged union to indicate that it should be serialized without the dots, and then build that into the generator in some fashion. Another alternative might be to somehow extend the grammar to support naming or providing an alias for the types in the tagged unions. A final alternative might be to support figure out some solution for flat typed within the typescript-lite generator.

I'd currently lean to leveraging flat typed, and work on figuring out a way to support them efficiently within the typescript-lite environment. How does that sound to you?

eboto commented 8 years ago

Thanks for the recommendations, @saeta! Yeah I thought it was such an odd requirement of Elastic's to have no dots in the name. Of the options you listed, I think that supporting flat typed unions in ts-lite is probably the best way to go in this search index case. Though I have already taken the nuclear option for our use-case and "modeled" the field (in our case body) as multiple different fields: bodyAsX, bodyAsY, bodyAsZ.

Gross, but it'll help us get off the ground. I might circle back to patching ts-lite in a future iteration.

Anyways, back to the root reported bug...it certainly seems like a bug that we would generate java and scala classes that can't be used outside of the root package. But I don't think it's a huge problem because pretty much everyone should be using namespaces.

So I'm going to quietly close this PR... =)