Closed comcmipi closed 9 years ago
Hi @comcmipi - the sample Thrift schema provided in this repo doesn't use
a union definition, are you trying to get this to work with a custom Thrift definition?
You can debug your thrift schema by generating Java output with thrift --gen java <your schema>.thrift
.
Regards, Brandon.
Hi Brandon,
thank you for your reply.
Yes, I'm using custom Thrift definition, something that can be simplified to:
union CoilID { 1: i64 register_id; }
I wasn't correct saying Thrift schema does not compile, it does. Problem arises when I try to compile spark scala program with simple assignment, e.g.
val sampleCoilID = new CoilID(123123123L)
It returns error:
[error] found : Long(123123123L) [error] required: com.adobe.spark_parquet_thrift.CoilID [error] val sampleCoilID = new CoilID(123123123L) [error] ^ [error] one error found error Compilation failed
When I change "union" to "struct", i.e.:
struct CoilID { 1: required i64 register_id; }
everything works fine.
Thank you for your help, Best, Michal
On 01/20/2015 08:15 PM, Brandon Amos wrote:
Hi @comcmipi https://github.com/comcmipi - the sample Thrift schema provided in this repo doesn't use a union definition, are you trying to get this to work with a custom Thrift definition? You can debug your thrift schema by generating Java output with |thrift --gen java
.thrift|. Regards, Brandon.
— Reply to this email directly or view it on GitHub https://github.com/adobe-research/spark-parquet-thrift-example/issues/1#issuecomment-70713693.
Mgr. Michal Pitoňák, PhD. Department of Physical and Theoretical Chemistry, Faculty of Natural Sciences of Comenius University Bratislava, Slovakia
Mlynská Dolina 842 15, Bratislava 4 Slovakia
Office: CH1-2-328 tel: +421 908 706 628
Hi Michal,
I think your issue is from Thrift's Java constructors for objects with unions being
different than objects with structs.
I changed the SampleThriftObject.thrift
in this repo to:
namespace java com.adobe.spark_parquet_thrift
union SampleThriftObject {
10: string col_a;
20: string col_b;
30: string col_c;
}
With a struct, these objects are initialized with
val sampleData = Range(1,10).toSeq.map{ v: Int =>
new SampleThriftObject("a"+v)
}
However, this now causes the same error you're seeing:
┌[bamos☮derecho]-(~/repos/spark-parquet-thrift-example)-[git://master ✗]-
└> sbt assembly
[info] Loading project definition from /home/bamos/repos/spark-parquet-thrift-example/project
[info] Set current project to SparkParquetThrift (in build file:/home/bamos/repos/spark-parquet-thrift-example/)
[warn] There may be incompatibilities among your library dependencies.
[warn] Here are some of the libraries that were evicted:
[warn] * org.apache.thrift:libthrift:0.7.0 -> 0.9.1
[warn] Run 'evicted' to see detailed eviction warnings
[info] Compiling 1 Scala source to /home/bamos/repos/spark-parquet-thrift-example/target/scala-2.10/classes...
[error] /home/bamos/repos/spark-parquet-thrift-example/src/main/scala/SparkParquetThriftApp.scala:62: type mismatch;
[error] found : String
[error] required: com.adobe.spark_parquet_thrift.SampleThriftObject
[error] new SampleThriftObject("a"+v)
[error] ^
[error] one error found
[error] (compile:compile) Compilation failed
[error] Total time: 4 s, completed Jan 21, 2015 7:12:47 PM
The source of this can be found in the sbt-thrift
generated Thrift object,
located at target/scala-2.10/src_managed/main/gen-java/com/adobe/spark_parquet_thrift/SampleThriftObject.java
.
The only available constructors are only from other objects, or
with _Fields. Not with strings, as the struct
object had.
public SampleThriftObject() {
super();
}
public SampleThriftObject(_Fields setField, Object value) {
super(setField, value);
}
public SampleThriftObject(SampleThriftObject other) {
super(other);
}
Further down the definition, there are functions for setting the values within the union:
public void setCol_a(String value) {
if (value == null) throw new NullPointerException();
setField_ = _Fields.COL_A;
value_ = value;
}
So, using this information, I'm able to successfully compile:
val sampleData = Range(1,10).toSeq.map{ v: Int =>
new SampleThriftObject().setCol_a("a" + v)
}
┌[bamos☮derecho]-(~/repos/spark-parquet-thrift-example)-[git://master ✗]-
└> sbt assembly
[info] Loading project definition from /home/bamos/repos/spark-parquet-thrift-example/project
[info] Set current project to SparkParquetThrift (in build file:/home/bamos/repos/spark-parquet-thrift-example/)
[warn] There may be incompatibilities among your library dependencies.
[warn] Here are some of the libraries that were evicted:
[warn] * org.apache.thrift:libthrift:0.7.0 -> 0.9.1
[warn] Run 'evicted' to see detailed eviction warnings
[info] Including from cache: slf4j-api-1.7.2.jar
[info] Including from cache: parquet-hadoop-1.5.0.jar
[info] Including from cache: commons-lang-2.4.jar
[info] Including from cache: parquet-column-1.5.0.jar
[info] Including from cache: parquet-format-2.1.0.jar
[info] Including from cache: parquet-common-1.5.0.jar
[info] Including from cache: httpcore-4.2.4.jar
[info] Including from cache: guava-11.0.1.jar
[info] Including from cache: parquet-encoding-1.5.0.jar
[info] Including from cache: akka-slf4j_2.10-2.2.3.jar
[info] Including from cache: libthrift-0.9.1.jar
[info] Including from cache: commons-logging-1.1.1.jar
[info] Including from cache: parquet-generator-1.5.0.jar
[info] Including from cache: jackson-core-asl-1.9.11.jar
[info] Including from cache: commons-codec-1.6.jar
[info] Including from cache: commons-lang3-3.1.jar
[info] Including from cache: json-simple-1.1.jar
[info] Including from cache: hadoop-lzo-0.4.16.jar
[info] Including from cache: jsr305-1.3.9.jar
[info] Including from cache: httpclient-4.2.5.jar
[info] Including from cache: parquet-jackson-1.5.0.jar
[info] Including from cache: protobuf-java-2.4.1.jar
[info] Including from cache: parquet-thrift-1.5.0.jar
[info] Including from cache: elephant-bird-pig-4.4.jar
[info] Including from cache: config-1.0.2.jar
[info] Including from cache: elephant-bird-core-4.4.jar
[info] Including from cache: parquet-pig-1.5.0.jar
[info] Including from cache: elephant-bird-hadoop-compat-4.4.jar
[info] Including from cache: jackson-mapper-asl-1.9.11.jar
[info] Including from cache: akka-actor_2.10-2.2.3.jar
[info] Including from cache: snappy-java-1.0.5.jar
[info] Including from cache: scala-library-2.10.3.jar
[info] Checking every *.class/*.jar file's SHA-1.
[info] Merging files...
[warn] Merging 'META-INF/MANIFEST.MF' with strategy 'discard'
[info] Strategy 'deduplicate' was applied to 3 files (Run the task at debug level to see details)
[warn] Strategy 'discard' was applied to a file
[info] Assembly up to date: /home/bamos/repos/spark-parquet-thrift-example/target/scala-2.10/SparkParquetThrift.jar
[success] Total time: 2 s, completed Jan 21, 2015 7:17:01 PM
Hope this helps!
Regards, Brandon.
Thank you very much Brandon,
worked perfectly!
All the best, Michal
On 01/22/2015 01:17 AM, Brandon Amos wrote:
Hi Michal,
I think your issue is from Thrift's Java constructors for objects with unions being different than objects with structs. I changed the |SampleThriftObject.thrift| in this repo to:
namespace java com.adobe.spark_parquet_thrift
union SampleThriftObject { 10: string col_a; 20: string col_b; 30: string col_c; }
With a struct, these objects are initialized with
val sampleData = Range(1,10).toSeq.map{v:Int => new SampleThriftObject("a"+v) }
However, this now causes the same error you're seeing:
┌[bamos☮derecho]-(~/repos/spark-parquet-thrift-example)-[git://master ✗]- └> sbt assembly [info] Loading project definition from /home/bamos/repos/spark-parquet-thrift-example/project [info] Set current project to SparkParquetThrift (in build file:/home/bamos/repos/spark-parquet-thrift-example/) [warn] There may be incompatibilities among your library dependencies. [warn] Here are some of the libraries that were evicted: [warn] * org.apache.thrift:libthrift:0.7.0 -> 0.9.1 [warn] Run 'evicted' to see detailed eviction warnings [info] Compiling 1 Scala source to /home/bamos/repos/spark-parquet-thrift-example/target/scala-2.10/classes... [error] /home/bamos/repos/spark-parquet-thrift-example/src/main/scala/SparkParquetThriftApp.scala:62: type mismatch; [error] found : String [error] required: com.adobe.spark_parquet_thrift.SampleThriftObject [error] new SampleThriftObject("a"+v) [error] ^ [error] one error found error Compilation failed [error] Total time: 4 s, completed Jan 21, 2015 7:12:47 PM The source of this can be found in the |sbt-thrift| generated Thrift object, located at |target/scala-2.10/src_managed/main/gen-java/com/adobe/spark_parquet_thrift/SampleThriftObject.java|.
The only available constructors are only from other objects, or with _Fields. Not with strings, as the |struct| object had.
public SampleThriftObject() { super(); }
public SampleThriftObject(_Fields setField,Object value) { super(setField, value); }
public SampleThriftObject(SampleThriftObject other) { super(other); }
Further down the definition, there are functions for setting the values within the union:
public void setCola(String value) { if (value== null)throw new NullPointerException(); setField= _Fields.COLA; value= value; }
So, using this information, I'm able to successfully compile:
val sampleData = Range(1,10).toSeq.map{v:Int => new SampleThriftObject().setCol_a("a" + v) }
┌[bamos☮derecho]-(~/repos/spark-parquet-thrift-example)-[git://master ✗]- └> sbt assembly [info] Loading project definition from /home/bamos/repos/spark-parquet-thrift-example/project [info] Set current project to SparkParquetThrift (in build file:/home/bamos/repos/spark-parquet-thrift-example/) [warn] There may be incompatibilities among your library dependencies. [warn] Here are some of the libraries that were evicted: [warn] * org.apache.thrift:libthrift:0.7.0 -> 0.9.1 [warn] Run 'evicted' to see detailed eviction warnings [info] Including from cache: slf4j-api-1.7.2.jar [info] Including from cache: parquet-hadoop-1.5.0.jar [info] Including from cache: commons-lang-2.4.jar [info] Including from cache: parquet-column-1.5.0.jar [info] Including from cache: parquet-format-2.1.0.jar [info] Including from cache: parquet-common-1.5.0.jar [info] Including from cache: httpcore-4.2.4.jar [info] Including from cache: guava-11.0.1.jar [info] Including from cache: parquet-encoding-1.5.0.jar [info] Including from cache: akka-slf4j_2.10-2.2.3.jar [info] Including from cache: libthrift-0.9.1.jar [info] Including from cache: commons-logging-1.1.1.jar [info] Including from cache: parquet-generator-1.5.0.jar [info] Including from cache: jackson-core-asl-1.9.11.jar [info] Including from cache: commons-codec-1.6.jar [info] Including from cache: commons-lang3-3.1.jar [info] Including from cache: json-simple-1.1.jar [info] Including from cache: hadoop-lzo-0.4.16.jar [info] Including from cache: jsr305-1.3.9.jar [info] Including from cache: httpclient-4.2.5.jar [info] Including from cache: parquet-jackson-1.5.0.jar [info] Including from cache: protobuf-java-2.4.1.jar [info] Including from cache: parquet-thrift-1.5.0.jar [info] Including from cache: elephant-bird-pig-4.4.jar [info] Including from cache: config-1.0.2.jar [info] Including from cache: elephant-bird-core-4.4.jar [info] Including from cache: parquet-pig-1.5.0.jar [info] Including from cache: elephant-bird-hadoop-compat-4.4.jar [info] Including from cache: jackson-mapper-asl-1.9.11.jar [info] Including from cache: akka-actor2.10-2.2.3.jar [info] Including from cache: snappy-java-1.0.5.jar [info] Including from cache: scala-library-2.10.3.jar [info] Checking every .class/_.jar file's SHA-1. [info] Merging files... [warn] Merging 'META-INF/MANIFEST.MF' with strategy 'discard' [info] Strategy 'deduplicate' was applied to 3 files (Run the task at debug level to see details) [warn] Strategy 'discard' was applied to a file [info] Assembly up to date: /home/bamos/repos/spark-parquet-thrift-example/target/scala-2.10/SparkParquetThrift.jar [success] Total time: 2 s, completed Jan 21, 2015 7:17:01 PM Hope this helps!
Regards, Brandon.
— Reply to this email directly or view it on GitHub https://github.com/adobe-research/spark-parquet-thrift-example/issues/1#issuecomment-70948930.
Mgr. Michal Pitoňák, PhD. Department of Physical and Theoretical Chemistry, Faculty of Natural Sciences of Comenius University Bratislava, Slovakia
Mlynská Dolina 842 15, Bratislava 4 Slovakia
Office: CH1-2-328 tel: +421 908 706 628
Hi,
I was not able to compile thrift schema containing "union" definition. Are unions supported in the current version?
Thank you very much, Best, Michal