cloudml / zen

Zen aims to provide the largest scale and the most efficient machine learning platform on top of Spark, including but not limited to logistic regression, latent dirichilet allocation, factorization machines and DNN.
Apache License 2.0
170 stars 75 forks source link

Upgrade spark version from 1.3.1 -> 1.5.1 #42

Closed witgo closed 8 years ago

bhoppi commented 8 years ago

编译后的jar包会变大4M,多出的文件来自以下artifact: com.sun.xml.bind:jaxb-core:2.2.7 com.sun.xml.bind:jaxb-impl:2.2.7 org.apache.parquet:parquet-column:1.7.0 org.apache.parquet:parquet-common:1.7.0 org.apache.parquet:parquet-encoding:1.7.0 org.apache.parquet:parquet-format:1.7.0 org.apache.parquet:parquet-generator:1.7.0 org.apache.parquet:parquet-hadoop:1.7.0 org.apache.parquet:parquet-jackson:1.7.0 org.codehaus.janino:commons-compiler:2.7.8 org.codehaus.janino:janino:2.7.8 是否应该把它们exclude出去?

bhoppi commented 8 years ago

另外,多出以下6个warnings: [WARNING] D:\collection\document\Intellij\zen\ml\src\main\scala\com\github\cloudml\zen\ml\recommendation\BSFMModel.scala:92: method parquetFile in class SQLContext is deprecated: Use read.parquet() [WARNING] val dataRDD = sqlContext.parquetFile(dataPath) [WARNING] ^ [WARNING] D:\collection\document\Intellij\zen\ml\src\main\scala\com\github\cloudml\zen\ml\recommendation\BSFMModel.scala:131: method saveAsParquetFile in class DataFrame is deprecated: Use write.parquet(path) [WARNING] factors.toDF("featureId", "factors").saveAsParquetFile(LoaderUtils.dataPath(path)) [WARNING] ^ [WARNING] D:\collection\document\Intellij\zen\ml\src\main\scala\com\github\cloudml\zen\ml\recommendation\FMModel.scala:106: method parquetFile in class SQLContext is deprecated: Use read.parquet() [WARNING] val dataRDD = sqlContext.parquetFile(dataPath) [WARNING] ^ [WARNING] D:\collection\document\Intellij\zen\ml\src\main\scala\com\github\cloudml\zen\ml\recommendation\FMModel.scala:144: method saveAsParquetFile in class DataFrame is deprecated: Use write.parquet(path) [WARNING] factors.toDF("featureId", "factors").saveAsParquetFile(LoaderUtils.dataPath(path)) [WARNING] ^ [WARNING] D:\collection\document\Intellij\zen\ml\src\main\scala\com\github\cloudml\zen\ml\recommendation\MVMModel.scala:109: method parquetFile in class SQLContext is deprecated: Use read.parquet() [WARNING] val dataRDD = sqlContext.parquetFile(dataPath) [WARNING] ^ [WARNING] D:\collection\document\Intellij\zen\ml\src\main\scala\com\github\cloudml\zen\ml\recommendation\MVMModel.scala:147: method saveAsParquetFile in class DataFrame is deprecated: Use write.parquet(path) [WARNING] factors.toDF("featureId", "factors").saveAsParquetFile(LoaderUtils.dataPath(path))

witgo commented 8 years ago

parquet相关可以exclude. 我来fix warnings.

witgo commented 8 years ago

@bhoppi 相关修改意见提交. 应该把编译后的文件assembly/target/scala-2.10/zen-assembly-0.2-SNAPSHOT-spark1.5.0.jar 放到集群上运行下,确保一切正常 . 顺利的话可以合并到master.

witgo commented 8 years ago

FM看起来没问题. 在spark 升级到1.5.1时(最近一两周)再合并到master吧.

MetaFlowRepo commented 8 years ago

有性能提升么? 赵博有些LDA相关GraphX优化可以移植带FM

Sent from my Windows Phone


发件人: Guoqiang Limailto:notifications@github.com 发送时间: ‎2015/‎9/‎18 23:43 收件人: cloudml/zenmailto:zen@noreply.github.com 主题: Re: [zen] Upgrade spark version from 1.3.1 -> 1.5.0. (#42)

FM看起来没问题. 在spark 升级到1.5.1(最近一两周)时再合并到master吧.


Reply to this email directly or view it on GitHub: https://github.com/cloudml/zen/pull/42#issuecomment-141486420

witgo commented 8 years ago

没有测试性能.最近没有适合的集群.

MetaFlowRepo commented 8 years ago

有空我来测试

Sent from my Windows Phone


发件人: Guoqiang Limailto:notifications@github.com 发送时间: ‎2015/‎9/‎18 23:52 收件人: cloudml/zenmailto:zen@noreply.github.com 抄送: sparkmlmailto:hucheng@outlook.com 主题: Re: [zen] Upgrade spark version from 1.3.1 -> 1.5.0. (#42)

没有测试性能.最近没有适合的集群.


Reply to this email directly or view it on GitHub: https://github.com/cloudml/zen/pull/42#issuecomment-141489889