Open piccolbo opened 9 years ago
My two thoughts:
1) Does Avro allow an enum with only one level?
2) If an enum is allowed to have a single level, we might need to change
the enum levels from a character vector to a list, so that toJSON will
produce ["d"]
instead of "d"
.
Jamie Olson
On Tue, Mar 3, 2015 at 3:59 PM, Antonio Piccolboni <notifications@github.com
wrote:
Error is ravro:::write.avro(df, tf1) Exception in thread "main" org.apache.avro.SchemaParseException: Enum has no symbols: {"name":"col_2","type":"enum","symbols":"d"} at org.apache.avro.Schema.parse(Schema.java:1121) at org.apache.avro.Schema.parse(Schema.java:1094) at org.apache.avro.Schema$Parser.parse(Schema.java:927) at org.apache.avro.Schema$Parser.parse(Schema.java:917) at org.apache.avro.Schema.parse(Schema.java:966) at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:91) at org.apache.avro.tool.Main.run(Main.java:80) at org.apache.avro.tool.Main.main(Main.java:69)
dump of df
df <- structure(list(col_1 = 139.084976531123, col_2 = structure(1L, .Label = "d", class = "factor"), col_3 = TRUE, col_4 = FALSE, col_5 = -11.3948273417181, col_6 = 90.2836501356233, col_7 = structure(1L, .Label = "", class = "factor"), col_8 = structure(1L, .Label = "57be", class = "factor")), .Names = c("col_1", "col_2", "col_3", "col_4", "col_5", "col_6", "col_7", "col_8" ), row.names = c(NA, -1L), class = "data.frame")
Another instance
Exception in thread "main" org.apache.avro.SchemaParseException: Enum has no symbols: {"name":"col_1","type":"enum","symbols":"_6f7a4bc347_ravro"} at org.apache.avro.Schema.parse(Schema.java:1121) at org.apache.avro.Schema.parse(Schema.java:1094) at org.apache.avro.Schema$Parser.parse(Schema.java:927) at org.apache.avro.Schema$Parser.parse(Schema.java:917) at org.apache.avro.Schema.parse(Schema.java:966) at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:91) at org.apache.avro.tool.Main.run(Main.java:80) at org.apache.avro.tool.Main.main(Main.java:69)
Dump
df <- structure(list(col_1 = structure(1L, .Label = "6f7a4bc347", class = "factor"), col_2 = structure(1L, .Label = "46f315f9", class = "factor"), col_3 = -158.916518470489, col_4 = -72.4716823839384, col_5 = 34L, col_6 = structure(1L, .Label = "6f7a", class = "factor"), col_7 = -10L, col_8 = 10L), .Names = c("col_1", "col_2", "col_3", "col_4", "col_5", "col_6", "col_7", "col_8"), row.names = c(NA, -1L), class = "data.frame")
My theory from several example is failure occurs iff input is a data frame with a single row and at least one factor column
— Reply to this email directly or view it on GitHub https://github.com/RevolutionAnalytics/ravro/issues/3.
I think it's admissible from reading the specs, but I am not sure it should be very high on our priority list. How useful are single level enums in real life? I modified my tests to generate at least two levels. I think we can reasonably delay this until there is a second request.
I mean you can close with won't fix AFAIK
Error is ravro:::write.avro(df, tf1) Exception in thread "main" org.apache.avro.SchemaParseException: Enum has no symbols: {"name":"col_2","type":"enum","symbols":"d"} at org.apache.avro.Schema.parse(Schema.java:1121) at org.apache.avro.Schema.parse(Schema.java:1094) at org.apache.avro.Schema$Parser.parse(Schema.java:927) at org.apache.avro.Schema$Parser.parse(Schema.java:917) at org.apache.avro.Schema.parse(Schema.java:966) at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:91) at org.apache.avro.tool.Main.run(Main.java:80) at org.apache.avro.tool.Main.main(Main.java:69)
dump of df
df <- structure(list(col_1 = 139.084976531123, col_2 = structure(1L, .Label = "d", class = "factor"), col_3 = TRUE, col_4 = FALSE, col_5 = -11.3948273417181, col_6 = 90.2836501356233, col_7 = structure(1L, .Label = "", class = "factor"), col_8 = structure(1L, .Label = "57be", class = "factor")), .Names = c("col_1", "col_2", "col_3", "col_4", "col_5", "col_6", "col_7", "col_8" ), row.names = c(NA, -1L), class = "data.frame")
Another instance
Exception in thread "main" org.apache.avro.SchemaParseException: Enum has no symbols: {"name":"col_1","type":"enum","symbols":"_6f7a4bc347_ravro"} at org.apache.avro.Schema.parse(Schema.java:1121) at org.apache.avro.Schema.parse(Schema.java:1094) at org.apache.avro.Schema$Parser.parse(Schema.java:927) at org.apache.avro.Schema$Parser.parse(Schema.java:917) at org.apache.avro.Schema.parse(Schema.java:966) at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:91) at org.apache.avro.tool.Main.run(Main.java:80) at org.apache.avro.tool.Main.main(Main.java:69)
Dump
df <- structure(list(col_1 = structure(1L, .Label = "6f7a4bc347", class = "factor"), col_2 = structure(1L, .Label = "46f315f9", class = "factor"), col_3 = -158.916518470489, col_4 = -72.4716823839384, col_5 = 34L, col_6 = structure(1L, .Label = "6f7a", class = "factor"), col_7 = -10L, col_8 = 10L), .Names = c("col_1", "col_2", "col_3", "col_4", "col_5", "col_6", "col_7", "col_8"), row.names = c(NA, -1L), class = "data.frame")
My theory from several example is failure occurs iff input is a data frame with a single row and at least one factor column